The Agent Loop
So with all those words about autonomy spectrums out of the way, let's build an agent loop.
Companion code: the post-02 tag.
So with all those words about autonomy spectrums out of the way, let's build an agent loop.
For the purposes of this series, I'll be making a ReAct agent built on the NeMo Agent Toolkit. In this post, we'll build a basic loop with HITL approval on every tool call - no classifier yet, one basic tool, and some basic hello-world exercises.
We're also going to be skipping the YAML configs, etc., so that we can actually have an embeddable implementation and not lean on NAT to also host our exposure points.
Embedding NAT by skipping the YAML
Under the hood, NAT parses its YAML into a Config model, passes that to WorkflowBuilder, and calls build() - letting us skip the CLI entirely.
from nat.builder.workflow_builder import WorkflowBuilder
from nat.data_models.component_ref import FunctionRef, LLMRef
from nat.llm.openai_llm import OpenAIModelConfig
from nat.plugins.langchain.agent.react_agent.register import ReActAgentWorkflowConfig
MAIN_LLM = LLMRef("main_llm")
SOME_TOOL = FunctionRef("some_tool")
async with WorkflowBuilder() as builder:
await builder.add_llm(MAIN_LLM, OpenAIModelConfig(
model_name=settings.model_name,
base_url=settings.base_url,
api_key=settings.api_key,
))
await builder.add_function(SOME_TOOL, SomeToolForTheAgentConfig())
await builder.set_workflow(ReActAgentWorkflowConfig(
tool_names=[SOME_TOOL],
llm_name=MAIN_LLM,
use_native_tool_calling=True,
))
workflow = await builder.build()
The HITL wiring
To do some sanity checking, for the purposes of this post, every tool call should pause and wait for human approval - we'll call it ask-mode-for-everything, the absolute floor of the autonomy spectrum.
To implement this, however, we hit a bit of a wrinkle: the ReAct agent's tool_node calls _call_tool() directly. There's no interception point. The fix was a wrapper tool:
class HITLCurrentTimeConfig(FunctionBaseConfig, name="hitl_current_datetime"):
"""Current datetime tool that requires HITL approval before executing."""
@register_function(config_type=HITLCurrentTimeConfig)
async def hitl_current_datetime(_config, _builder):
async def _get_current_time(query: str) -> str:
approved = await prompt_binary_approval(
tool_approval_prompt(
"current_datetime",
"get current date and time",
)
)
if not approved:
return REJECTION_MESSAGE
return f"The current time is {datetime.now(tz=UTC).isoformat()}"
yield FunctionInfo.from_fn(
_get_current_time,
description="Returns the current date/time.",
)
Ask-mode-for-everything: the loop stops and waits for a human before the one tool runs.
The lesson is bigger than NAT. Unit tests verify pieces; they don't verify wiring. In agent frameworks, components get connected by the runtime - not by code you can grep for. So you need three layers of tests, not one:
| Layer | What it tests | What it catches |
|---|---|---|
| Unit | Each function works in isolation | Broken function logic |
| Integration | The framework actually calls them | Wiring bugs - pieces exist but nothing connects them |
| Trajectory | Calls happen in the right order | Approval fires after execution, or tool runs twice |
For the trajectory layer, I patched both prompt_binary_approval and the datetime call to append to a shared list, then asserted on the sequence: ["approve", "datetime_now"] for an approval, ["approve"] alone for a rejection (datetime never fires). Same principle as trace-based testing, smaller scope. If you've ever shipped LangGraph or CrewAI or AutoGen and felt the bug-class where "everything works in tests but the agent ignores my guardrail," this is what it was.
As an aside on observability plumbing: NAT's stock react_agent builds the agent graph without a callback handler, so the LLM tier emits no LLM_START/LLM_END events - only the tool calls reach the stream. I fork NAT's own ReAct register into src/loop/react_steps.py and attach NAT's LangchainProfilerHandler per run (the same pattern sequential_executor uses), so model latency and token-by-token streaming flow through the canonical event stream.
Owning the gateway
NAT ships nat serve - a command that consumes a YAML config and launches a FastAPI server that exposes the workflow over HTTP and WebSocket.
This series won't use it, to demonstrate how this kind of implementation can be embeddable and to let us actually implement the classifier that sits between the agent and the tool, and create Pydantic policy layers that gate sandbox creation. A planner/validator handshake needs A2A on the back side. Each of these capabilities extends a path that runs through our code. If nat serve is the server, those extensions become compounded indirection with "proxy NAT through our gateway, then proxy back to NAT" gymnastics.
The structural call is simpler: we use NAT as a library. We own the FastAPI surface, the routing, the auth, and the integration points. NAT is one component in that, not the whole thing.
@asynccontextmanager
async def lifespan(application: FastAPI) -> AsyncGenerator[None]:
settings = AgentSettings()
async with WorkflowBuilder() as builder:
config = await configure_builder(builder, settings)
application.state.session_manager = await SessionManager.create(
config=config, shared_builder=builder,
)
yield
await application.state.session_manager.shutdown()
app = FastAPI(lifespan=lifespan)
app.include_router(server_router)
The service layer underneath is transport-agnostic. run_agent takes a SendFn callback, not a WebSocket object, and the router passes websocket.send_json. We can reuse this pattern for every future independent agentic service we add, and each agent doesn't need to know what transport it's running over.
It's also worth mentioning that for OTel tracing NAT has its own span pipeline (IntermediateStep → Span → OtelSpan → OTLP) that you'll need to wire explicitly:
await builder.add_telemetry_exporter(
"otel",
OtelCollectorTelemetryExporter(
project="agent-auto-mode",
endpoint=settings.otel_endpoint,
),
)
Those spans aren't just for debugging - they're the start of the audit trail the rest of the series leans on, where every agent decision stays inspectable after the fact.
Where this leaves us
What's running:
- A ReAct agent built entirely from Python. WorkflowBuilder, no YAML, no nat run.
- HITL approval on every tool call, wired inside the tool body. Three-layer tests (unit / integration / trajectory).
- Native tool calling on GLM-5.1 with thinking-mode.
- A FastAPI gateway we own, with a transport-agnostic service layer, ready for additional features.
- OTel spans on the wire.
What's lame:
- The agent has one tool. "What time is it?" is the demo. Every interesting task needs file edit, shell, grep, glob - a real surface.
- Every action prompts for approval. A single "find all TODOs" task generates dozens of tool calls. Click click click click. This is ask mode at its worst and the motivation for what's next.
So let's get to some tool-surface work.