Long-Horizon Agent State

    Most deployed agent systems are stateless function calls: a prompt goes in, an output comes out, and the context vanishes. That design is adequate for a single task that fits inside one context window. It collapses the moment work must span many sessions, many agents, and many days—because nothing the agent learned, decided, or planned survives the run that produced it.

    Long-horizon agent state is the persistent, structured state that lets agents pursue goals across multiple sessions, runs, and human interactions—not just within a single context window. The term rhymes with "long-horizon planning" from the AI research literature, but it shifts the focus from the model's planning ability to the infrastructure surrounding the model. Long-horizon here means work where state must outlive any single run: the next agent, or the next week, depends on something the last run produced.

    The thesis of this chapter is that the bottleneck for production, long-horizon agent work is state infrastructure, not model capability. A model that plans flawlessly inside 200K tokens still cannot remember what it learned last week, know which tasks are blocked versus ready, ask a human a question and wait for the answer, chain ten reviewed changes in sequence, or explain to the next agent why a decision was made. None of those are model problems. They are infrastructure problems, and they are the subject of this chapter.


    Why This Matters

    The industry conversation about agents concentrates on reasoning, tool use, and planning inside the context window—properties of the model. Production deployments fail somewhere else. They fail because a learning discovered on Monday is gone by Tuesday, because two agents working the same codebase have no shared record of what was decided, and because a multi-step goal has no representation that survives the agent that started it.

    These failures share a root cause: the system has nowhere to put state that outlives a single run. Improving the model does not fix them. Building durable, structured, mergeable state does. That reframing—from capability to infrastructure—is what makes long-horizon agent state a distinct topic rather than a footnote to context management.


    Sections

    Section Title Focus
    1 The Five-Layer Model Decomposing agent state into execution, memory, intent, coordination, and orchestration
    2 The Composition Loop How the layers close into a feedback loop, and four design principles for building them
    3 Memory and Intent Typed expertise the next agent inherits; explicit work decomposition and intent-shapes
    4 Coordination and Orchestration The shared human-agent event log and multi-run execution across many agents
    5 From Factory to Institution Forward-looking proposal for a sixth layer that feeds outcomes back into intent

    Connections

    • Context Management Architectures: Long-horizon state is the layer above the context window. Context management decides what a single run sees between turns; long-horizon state decides what survives between runs. Memory that lives outside any context window is the natural extension of treating context as a first-class architectural concern.
    • Orchestrator Pattern, Autonomous Loops, Multi-Agent Collaboration: These patterns describe how agents coordinate and chain work. The five-layer model describes the state those patterns read and write—the substrate that makes the coordination durable rather than ephemeral.
    • Mental Models: Long-horizon state operationalizes several models from Chapter 9—Context as Code (memory as versioned software), Specs as Source Code (intent as the primary artifact), and Software Factories (the assembled production system).
    • Agent Readiness: Memory is one of the four surfaces an agent-ready codebase must expose. This chapter treats that surface in depth and places it alongside the four other layers of durable state.