The Agent Loop · Matthew Gladney

So with all those words about autonomy spectrums out of the way, let's build an agent loop.

Companion code: the post-02 tag.

So with all those words about autonomy spectrums out of the way, let's build an agent loop. For the purposes of this series, I'm building a ReAct agent on the NeMo Agent Toolkit. This post is the skeleton: a loop with HITL approval on every tool call - no classifier yet, one tool, a few hello-world runs.

We're building this as an embeddable system from the start - NAT as a library we drive from Python, not a server we hand a config file.

Embedding NAT as a library

Under the hood, NAT's CLI just parses a YAML file into a Config object, hands it to WorkflowBuilder, and calls build(). We build that model directly in Python.

from nat.builder.workflow_builder import WorkflowBuilder
from nat.data_models.component_ref import FunctionRef, LLMRef
from nat.llm.openai_llm import OpenAIModelConfig
from nat.plugins.langchain.agent.react_agent.register import ReActAgentWorkflowConfig

MAIN_LLM = LLMRef("main_llm")
SOME_TOOL = FunctionRef("some_tool")

async with WorkflowBuilder() as builder:
    await builder.add_llm(MAIN_LLM, OpenAIModelConfig(
        model_name=settings.model_name,
        base_url=settings.base_url,
        api_key=settings.api_key,
    ))
    await builder.add_function(SOME_TOOL, SomeToolForTheAgentConfig())
    await builder.set_workflow(ReActAgentWorkflowConfig(
        tool_names=[SOME_TOOL],
        llm_name=MAIN_LLM,
        use_native_tool_calling=True,
    ))
    workflow = await builder.build()

The HITL wiring

To do some sanity checking, for the purposes of this post, every tool call should pause and wait for human approval - we'll call it ask-mode-for-everything, the absolute floor of the autonomy spectrum.

To implement this, however, we hit a wrinkle: the ReAct agent calls the tool straight from its graph-there's no node between "decide" and "execute" to hang approval on. NAT does have a function-middleware layer that wraps tools from the outside (which I'll reach for in Post 3 once we have a real tool surface). But for a single tool, the simplest interception is inside the tool itself. The first thing it does is ask for permission:

class HITLCurrentTimeConfig(FunctionBaseConfig, name="hitl_current_datetime"):
    """Current datetime tool that requires HITL approval before executing."""

@register_function(config_type=HITLCurrentTimeConfig)
async def hitl_current_datetime(_config, _builder):
    async def _get_current_time() -> str:
        approved = await prompt_binary_approval(
            tool_approval_prompt(
                "current_datetime",
                "get current date and time",
            )
        )
        if not approved:
            return REJECTION_MESSAGE
        now = datetime.now(tz=UTC)
        return f"The current time is {now.strftime('%Y-%m-%d %H:%M:%S %z')}"

    yield FunctionInfo.from_fn(
        _get_current_time,
        description="Returns the current date/time.",
    )

The agent dashboard paused on a human-in-the-loop approval for the single <code>current_datetime</code> tool, with the live agent trace on the right. Ask-mode-for-everything: the loop stops and waits for a human before the one tool runs.

The lesson is bigger than NAT. Unit tests verify pieces; they don't verify wiring. In agent frameworks, components get connected by the runtime - not by code you can grep for. So you need three layers of tests, not one:

Layer	What it tests	What it catches
Unit	Each function works in isolation	Broken function logic
Integration	The framework actually calls them	Wiring bugs - pieces exist but nothing connects them
Trajectory	Calls happen in the right order	Approval fires after execution, or tool runs twice

For the trajectory layer, I patched both prompt_binary_approval and the datetime call to append to a shared list, then asserted on the sequence: ["approve", "datetime_now"] for an approval, ["approve"] alone for a rejection (datetime never fires). Same principle as trace-based testing, smaller scope. If you've ever shipped LangGraph or CrewAI or AutoGen and felt the bug-class where "everything works in tests but the agent ignores my guardrail," this is what it was.

As an aside on observability plumbing: NAT's stock react_agent builds the agent graph without a callback handler, so the LLM tier emits no LLM_START/LLM_END events - only the tool calls reach the stream. I fork NAT's own ReAct register into src/loop/react_steps.py and attach NAT's LangchainProfilerHandler per run (the same pattern sequential_executor uses), so model latency and token-by-token streaming flow through the canonical event stream.

Owning the gateway

NAT ships nat serve - a command that consumes a config and launches a FastAPI server that exposes the workflow over HTTP and WebSocket.

This series won't use it, to demonstrate how this kind of implementation can be embeddable and to let us actually implement the classifier that sits between the agent and the tool, and create Pydantic policy layers that gate sandbox creation. A planner/validator handshake needs A2A on the back side. Each of these capabilities extends a path that runs through our code. If nat serve is the server, those extensions become compounded indirection with "proxy NAT through our gateway, then proxy back to NAT" gymnastics.

The structural call is simpler: we use NAT as a library. We own the FastAPI surface, the routing, the auth, and the integration points. NAT is one component in that, not the whole thing.

@asynccontextmanager
async def lifespan(application: FastAPI) -> AsyncGenerator[None]:
    settings = AgentSettings()
    async with WorkflowBuilder() as builder:
        config = await configure_builder(builder, settings)
        application.state.session_manager = await SessionManager.create(
            config=config, shared_builder=builder,
        )
        yield
        await application.state.session_manager.shutdown()

app = FastAPI(lifespan=lifespan)
app.include_router(server_router)

The service layer underneath is transport-agnostic. run_agent takes a SendFn callback, not a WebSocket object, and the router passes websocket.send_json. We can reuse this pattern for every future independent agentic service we add, and each agent doesn't need to know what transport it's running over.

It's also worth mentioning that for OTel tracing NAT has its own span pipeline (IntermediateStep → Span → OtelSpan → OTLP) that you'll need to wire explicitly:

await builder.add_telemetry_exporter(
    "otel",
    OtelCollectorTelemetryExporter(
        project="overseer-in-the-loop",
        endpoint=settings.otel_endpoint,
    ),
)

Those spans aren't just for debugging - they are the immutable audit trail that the rest of this series leans on. If you recall the Privileged Identity Management (PIM) concepts from Post 1, true autonomy requires accountability. OpenTelemetry provides the receipt. Every decision, action, and context window is captured, ensuring that when the agent operates "at arm's length", its reasoning remains entirely transparent and auditable after the fact.

Where this leaves us

What's running:

A ReAct agent built entirely in Python via WorkflowBuilder - no additional CLI or server, no extra config files.
HITL approval on every tool call, wired inside the tool body. Three-layer tests (unit / integration / trajectory).
Native tool calling on GLM-5.1 with thinking-mode.
A FastAPI gateway we own, with a transport-agnostic service layer, ready for additional features.
OTel spans on the wire.

What's lame:

The agent has one tool. "What time is it?" is the demo. Every interesting task needs file edit, shell, grep, glob - a real surface.
Every action prompts for approval. A single "find all TODOs" task generates dozens of tool calls. Click click click click. This is ask mode at its worst and the motivation for what's next.

So let's get to some tool-surface work.