Skip to the content.

AgentBase: Designing a Full-Agent Lifecycle with Factory, Runtime, and Observer

Contact Me


Agent systems become easier to reason about when we break them into clean, reusable stages. Building on the ideas from “LLM Generates Tokens, Agent Generates Messages, AgentLauncher Generates Agents,” this post introduces AgentBase: a three-stage design pattern that stretches from agent creation to performance evaluation and back again.

Why AgentBase?

Modern AI projects cycle continually through ideation, deployment, and refinement. AgentBase formalizes that loop so teams can:

The key insight is to treat every agent as the product of three collaborating roles:

  1. AgentFactory – constructs a runnable agent workflow from specs.
  2. AgentRuntime – executes the workflow on a concrete task input.
  3. AgentObserver – scores the outcome and feeds improvements back into the factory.

Key Artifacts

Before drilling into each stage, we anchor the conversation around three core artifacts.

Agent Workflow Spec (Factory Output)

The factory produces a structured blueprint that downstream stages can execute without guessing. At minimum it should capture:

The spec itself should be serialized as YAML or JSON so runtimes can consume it directly without bespoke parsers.

Runtime Execution Record (Runtime Output)

While running a task, the runtime packages everything the observer (and future humans) need to understand what happened:

This record becomes the observer’s raw material and feeds future rounds back in the factory.

Observer Report (Observer Output)

The observer evaluates the runtime record using the agreed measurement plan and emits:

This report becomes both the artifact presented to stakeholders and the feedback payload that the factory can ingest for iterative improvement.

Flow of Inputs and Artifacts

With those artifacts defined, the AgentBase contract expects the following baseline inputs:

Every stage adds to or transforms this information:

Stage Consumes Produces
AgentFactory task description, test inputs, optional outputs, measurements, optional previous report Agent spec (workflow + configuration)
AgentRuntime agent spec, real task input Task output plus execution trace
AgentObserver agent spec, task description, task input, task output, measurement plan Performance report with improvement hints

When improvements are needed, the observer’s report travels back to the factory as an extra input, and the loop iterates.

Stage 1: AgentFactory

An AgentFactory instance starts life with its own system prompt that states the factory’s charter: absorb the task brief, resist overfitting to narrow examples, and emit a reusable agent workflow spec. With that context in place, the factory executes a predictable loop:

  1. Ingest the task description, representative inputs, optional outputs, measurement criteria, and any prior observer report.
  2. Synthesize an agent system prompt that encodes task goals, safety and policy constraints, and the expected shape of raw string inputs and structured outputs.
  3. Assemble the tooling roster and runtime configuration needed to execute the prompt reliably.

The resulting specification typically bundles:

Inputs can include the observer’s last report. If a previous iteration flagged slow tool calls or low accuracy, the factory bakes new heuristics or alternative tool selections into the next spec.

Think of the factory as a compiler: it ingests requirements and diagnostics, then outputs an optimized agent blueprint.

Stage 2: AgentRuntime

AgentRuntime owns execution. It consumes the factory artifacts wholesale—system prompts, tool manifests, runtime configuration—and spins up a real agent ready to process any task input. Unlike the factory’s sample-based view, the runtime must accept the entire task payload, performing any required pre-processing before handing it to the agent loop. A typical cycle looks like:

  1. Instantiate the agent per the spec, wiring in prompts, tools, and runtime configuration.
  2. Feed the full task input into the conversation (optionally in parallel when multiple inputs arrive), initializing system/user messages accordingly.
  3. Execute the iterative loop, orchestrating LLM calls and tool invocations as defined.
  4. Capture the conversation trace, final output, and real-time metrics (token usage, latency, cost), packaging them into the runtime execution record.

Runtime does not judge quality—that’s the observer’s job. Its responsibility is fidelity: the trace must capture enough detail to reproduce decisions later and support parallel task handling when needed.

Stage 3: AgentObserver

AgentObserver closes the loop. It ingests the agent spec, task description, and the full runtime execution record (task input, conversation trace, outputs, telemetry) alongside the user-provided measurement plan, then analyzes every detail strictly along the dimensions that plan specifies—nothing more, nothing less.

Observer deliverables:

The report flows back to the factory. Depending on its feedback, the next iteration could refine prompts, add fallback strategies, or even escalate to a human review step.

The AgentBase Feedback Loop

task description + measurement plan
            ↓
       AgentFactory
            ↓ (agent spec)
       AgentRuntime
            ↓ (outputs + trace)
       AgentObserver
            ↓ (performance report)
        ┌──────────────┐
        │  Improve?     │
        │   Yes → back  │
        │   No  → ship  │
        └──────────────┘

Key properties:

Practical Tips

Wrapping Up

AgentBase reframes “build an agent” as “design a loop.” By separating creation, execution, and evaluation, you gain:

Whether you’re orchestrating a single assistant or a fleet of specialized agents, this pattern keeps every improvement grounded in data and structured feedback. Ready for the next turn? Feed the observer’s report back into the factory and keep the loop spinning.