Skip to the content.

Context Management for LLM Agent Systems

Contact Me


Integrated Context Management

Building on the concepts from Standardize the Agent Lifecycle, Agentic Conversation Management, Context Offload via Sub-Agent, and Git Context Memory.

In complex LLM applications, effective context management is critical for maintaining performance, coherence, and scalability. This post presents an integrated approach that combines three complementary strategies: conversation management, sub-agent offloading, and Git-based context storage—creating a comprehensive solution for handling context at multiple levels.

The Context Management Challenge

As LLM agents work on complex tasks, they face several context-related challenges:

  1. Context Window Limits: Conversations can exceed the model’s context window
  2. Performance Degradation: Long conversations increase latency and reduce response quality
  3. Cost Issues: Longer contexts mean higher API costs per call
  4. Information Loss: Important information gets lost when context is truncated
  5. Task Complexity: Complex tasks require maintaining multiple parallel contexts
  6. Persistence Needs: Context needs to be saved and retrieved across sessions

Traditional approaches like simple truncation or basic summarization are insufficient. We need a multi-faceted strategy.

Three Complementary Strategies

1. Conversation Management: Real-Time Optimization

What It Does: Cleans up and optimizes the current conversation to fit within the LLM’s context window

Scope: Works on the active conversation only, affecting what’s currently being processed by the LLM

How It’s Triggered:

Available Tools:

Example:

def conversation_management_agent(conversation, management_goal, llm_call):
    management_tools = [edit_message, delete_message, summarize_messages]
    while not is_goal_achieved(conversation, management_goal):
        llm_output = llm_call(conversation, management_goal, management_tools)
        for tool_call in llm_output.tool_calls:
            tool_result = execute_tool(tool_call, management_tools)
            conversation = update_conversation(conversation, tool_result)
    return conversation

2. Sub-Agent Offloading: Task Distribution

What It Does: Delegates simple, independent tasks to separate, isolated agent instances

Scope: Offloads straightforward work to temporary agents with their own conversations

How It’s Triggered:

Key Benefits:

Lifecycle:

  1. Create: Main agent dynamically generates task-specific instructions
  2. Launch: Sub-agent runs its complete lifecycle with isolated conversation
  3. Delete: Sub-agent conversation is discarded, only the condensed result is returned

Example:

def launch_subagent(task_instructions, tools):
    # Sub-agent runs with completely isolated context
    subagent_conversation = [system_message, task_instructions]
    
    # Sub-agent executes its full lifecycle independently
    result = agent_lifecycle(subagent_conversation, tools, llm_call, tool_call)
    
    # Extract only the essential result
    condensed_result = extract_final_result(result)
    
    # Sub-agent conversation is discarded here
    # Only condensed_result goes back to main agent
    return condensed_result

3. Git Context Memory: Persistent Memory Storage

What It Does: Saves context to Git for persistent memory, enabling long-term storage, history tracking, and cross-session sharing

Scope: Manages persistent memory across multiple sessions and over time, beyond the current conversation

How It’s Triggered:

Key Advantages:

Available Tools:

Unified Implementation

Here’s how to integrate all three strategies into a cohesive agent system:

def integrated_agent_lifecycle(system_message, user_message, llm_call, tool_call):
    # Note: system_message should instruct the LLM about:
    # - When to use Git context tools (e.g., checkpoint important progress, recall past context)
    # - When to delegate tasks to sub-agents (e.g., simple independent tasks)
    # - How to respond to token pressure information
    
    conversation = [system_message, user_message]
    
    # Git context tools available to LLM
    git_context = GitContextManager('context.txt')
    
    # Load persisted context from Git if exists
    persisted_context = git_context.read_context()
    if persisted_context:
        conversation.append(persisted_context)
    
    # All tools available to LLM
    tool_set = [
        # Conversation management tools
        edit_message, delete_message, summarize_messages,
        # Sub-agent offload tool
        launch_subagent,
        # Git context memory tools
        read_context, update_context, get_context_history, 
        get_snapshot, create_branch, merge_branch,
        # Other domain tools
        ...
    ]
    
    while True:
        # System forces conversation management if critical limit exceeded
        if must_reduce_conversation(conversation):
            conversation = conversation_management_agent(
                conversation,
                management_goal="reduce_to_safe_length",
                llm_call=llm_call
            )
        
        # System can add token pressure info for LLM awareness
        current_tokens = estimate_tokens(conversation)
        if approaching_token_limit(current_tokens):
            conversation.append(create_message(
                role="system",
                content=f"Token pressure: {current_tokens}/{max_tokens}"
            ))
        
        # LLM processes conversation and decides which tools to call
        # LLM can see token pressure info and call conversation management tools
        llm_output_messages = llm_call(conversation, tool_set)
        conversation.extend(llm_output_messages)
        
        # Execute any tool calls made by LLM
        if tool_call_message in llm_output_messages:
            tool_result_message = tool_call(tool_call_message, tool_set)
            conversation.append(tool_result_message)
            # Tool could be: conversation management, sub-agent, Git context, or domain tools
        else:
            break
    
    return conversation

How the Strategies Work Together

All three strategies are invoked by the LLM calling the appropriate tools. Conversation Management has two triggering modes:

Agent Session
├── Conversation Management
│   ├── LLM calls tools (aware of token pressure from system messages)
│   └── System forces when critical token limit exceeded
├── Sub-Agent Offload
│   └── LLM calls tool to delegate tasks
└── Git Context Memory
    └── LLM calls tools to checkpoint/recall context

Example Flow:

  1. Main agent works on a task, conversation grows
  2. System adds token pressure message → LLM sees it and calls conversation management tools
  3. LLM calls sub-agent tool to delegate a simple independent subtask
  4. If conversation exceeds critical limit → System forces conversation management
  5. LLM calls Git tools to checkpoint important progress
  6. Later session: LLM calls Git tools to recall previous context

Example: Coding Agent

A natural application of Git context memory is in coding agents, where the codebase itself serves as persistent memory:

Code as Context Memory: Each code update is like a context commit. When the agent writes code to a file and commits it with git_commit(files=["main.py"], message="Fixed authentication bug"), it’s checkpointing both the work and the reasoning.

Recalling Past Work: The agent can use Git tools to learn from history:

Example Flow: Agent receives task “Add authentication” → Calls git_log() to check if it was tried before → Finds commit “Attempted authentication, failed due to X” → Reads that code with git_show() → Learns from the mistake → Implements better solution → Commits with message for future recall.

This makes Git context memory especially natural for coding agents, where code files are both the work product and the persistent memory.

Conclusion

The integrated context management approach presented here combines three powerful strategies into a cohesive system:

Together, these strategies create a robust, scalable solution for managing context in sophisticated LLM agent systems. The approach adapts to varying needs, supports multi-user collaboration, preserves important information, and enables agents to work effectively even on long-running tasks.

By addressing context management with three complementary strategies—context window optimization, task delegation, and persistent memory storage—we build agent systems that are both powerful and maintainable, capable of handling the demanding requirements of real-world applications while remaining comprehensible and debuggable.