Context Management Strategies | Agentic Engineering

Practical approaches for managing context windows, balancing injection vs. retrieval, and structuring context for optimal performance.

Managing Context Window Limits

Context availability serves as a rough proxy for remaining model capability. A model with 25% context utilization retains approximately 75% capability capacity. At 90% utilization, effective capability drops to roughly 10%.

Current Practice: Boot Fresh Agents

[2025-12-10]: Context compaction via secondary agents is not a viable strategy in current implementations (December 2025). When agents near context limits or output quality degrades, boot a fresh instance rather than attempting salvage through compression.

Tooling support improves this workflow. Claude Code's hook system can automatically log agent actions—reads, writes, tool calls—enabling rapid reconstruction of relevant context for successor agents. This makes "boot fresh" practical even for complex workflows.

When Compaction Makes Sense

See Frequent Intentional Compaction below for the exception: proactive compression at 40-60% utilization as a deliberate quality maintenance strategy, not emergency salvage.

Context Compression and Summarization

Compression works in conversational exchanges where information density matters less than continuity. Technical work like coding demands precision—context compression typically introduces gaps that manifest as "context rot" or "context bloat."

The "one agent, one task" principle applies: if an agent can't complete a task within its context budget, adjust the prompt or scope. Chaining multiple agents to compact and summarize context invites hallucination without guaranteeing quality preservation.

Architecture-Centric Approaches

[2026-03-09]: Deterministic context management architectures (LCM, Sapling) formalize compaction into engineered systems rather than emergency measures. See Context Management Architectures for a comparative survey of lossless preservation, continuous curation, and the passive baseline.

Injection vs. Retrieval Balance

Injection for Priming

Context injection typically occurs at session start to prime agent behavior. Common injection targets include:

Base configuration (CLAUDE.md, project structure)
Recent activity context (git history, previous session summaries)
Domain knowledge (relevant documentation, coding standards)

Retrieval for Discovery

Agent-driven retrieval works well for codebase exploration, particularly in large repositories. The agent discovers relevant files based on task requirements rather than receiving a predetermined payload.

The Balance Point

Codebase scale determines the injection-retrieval balance. Small codebases tolerate more retrieval (agents find relevant files efficiently). Large codebases risk context bloat from excessive exploration—agents read tangential files searching for relevant content.

Open Questions

Areas requiring deeper exploration:

What constitutes optimal "priming" payload for typical coding sessions? CLAUDE.md alone? Recent git history? Documentation excerpts?
When retrieval produces context bloat (irrelevant file reads), can mid-session intervention recover quality, or does it require agent restart?
Does a codebase size threshold exist where injection-heavy approaches become mandatory due to retrieval unreliability?

Structured Context Patterns

Structured context (markdown, JSON, XML) consistently outperforms unstructured prose. Structure provides:

Focus mechanisms — Headers, sections, and delimiters guide attention
Parsing support — Both models and humans parse structured content more reliably
Prompt engineering aid — Templates and schemas assist both human developers and meta-prompting agents

Markdown offers the best balance: human-readable, model-friendly, and widely supported. JSON and XML work for machine-readable payloads where parsing guarantees matter.

Understanding Context Rot

Observable Symptoms

Areas requiring characterization:

Context rot manifests as quality degradation distinct from simple model errors. Symptoms likely include:

Inconsistent outputs despite identical prompts
Drift from original task specifications
Increasing reliance on context window periphery
Degraded reasoning chains compared to early-session outputs

Reversibility Question

Whether context rot can be reversed within a session or requires agent restart remains uncharacterized. Current practice defaults to "boot fresh" when symptoms appear, suggesting low confidence in within-session recovery.

Distinction from Model Confusion

Context rot differs from standard model errors or confusion. Model errors stem from capability limits or ambiguous prompts. Context rot results from degraded working memory—the model's capability remains intact, but its operational context has compromised output quality.

Context Window as Finite Resource

[2026-02-06]: GSD project (12K-star open source tool) treats context window as a non-renewable resource with explicit quality relationship: Quality ∝ 1/(% context used). This inverse relationship suggests quality degrades proportionally to context fill. Their mitigation: each plan execution gets fresh 200K context window, sized to remain <50% utilized. Main context never accumulates degradation. Context rot emerges from resource depletion—once context fills, quality cannot be recovered through compression, only through fresh allocation. This aligns with the "boot fresh agents" guidance above but adds quantified threshold: target <50% utilization for sustained quality, not just "avoid 95%."

Context Window Percentage Monitoring

[2026-01-17]: Claude Code 2.1.9 introduced real-time context utilization percentage display, transforming context management from guesswork to data-driven decision-making.

Display Format

The session header shows context usage as:

[45K/200K tokens] 22%

This updates in real-time as conversation progresses, accounting for both input and output tokens consumed.

Operational Thresholds

Range	Signal	Recommended Action
0-30%	Healthy	Continue normally
30-60%	Monitor	Good checkpoint for intentional compaction
60-80%	Caution	Consider fresh session if major phase ends
80-95%	Warning	Begin graceful task wrap-up
95%+	Critical	Boot new agent immediately

These thresholds complement the frequent intentional compaction strategy described below. The 30-60% "Monitor" range aligns directly with the 40-60% compaction trigger.

Percentage-Driven Decision Framework

Instead of guessing when context nears capacity:

Set mental alert at 60%: "Should I compact or continue?"
At natural break points: Compact if above 50%
At 80%: Stop accepting new work, finish current task
At 95%: Force new session (no negotiation)

This framework removes ambiguity from the "when to compact" decision. The percentage provides objective data; the thresholds provide clear action triggers.

Multi-Agent Context Tracking

In orchestrated workflows, track per-agent percentages to enable proactive scheduling:

# Example subagent completion output
{
    "agent": "builder-agent",
    "context_used": 156000,
    "context_limit": 200000,
    "percentage": 78,
    "recommendation": "boot-fresh-next-task"
}

This enables capacity-aware task routing:

Route new work to agents with headroom
Reboot agents approaching limits before task assignment
Predict whether an agent can complete a task within remaining capacity

Integration with Hooks

PostToolUse hooks can monitor percentage progression automatically:

def post_tool_use(event):
    percentage = event.get("context_percentage", 0)
    if percentage > 75:
        log_alert(f"Context at {percentage}%—consider compaction")

This provides operational awareness without manual monitoring, surfacing warnings at configurable thresholds.

Practical Application

The percentage display transforms context management from reactive ("the agent seems confused") to proactive ("we're at 65%, compact before the next task").

Combined with the compaction strategies below, percentage monitoring enables:

Evidence-based compaction timing — Compact at 50% rather than guessing
Predictable session planning — Know when you'll need a fresh agent
Quality maintenance — Intervene before degradation, not after

Frequent Intentional Compaction

[2025-12-10]: Most teams compact context reactively—when the agent hits 95% capacity and auto-summarization kicks in as an emergency measure. By then, quality has already degraded. Frequent intentional compaction flips this: compact proactively at 40-60% utilization to maintain quality, not salvage it.

The Pattern: Instead of waiting for context limits to force compression, compact deliberately and frequently throughout a session. Target 40-60% context utilization as your compaction trigger, not 90%+.

Optimization Priority Order:

Correctness — Preserve factual accuracy above all else
Completeness — Ensure all critical information survives compaction
Signal-to-noise — Remove redundancy, keep high-value context
Trajectory — Maintain the narrative thread of what's been done and why

This ordering matters. Emergency compaction at 95% often sacrifices correctness for brevity. Intentional compaction at 50% can optimize all four dimensions without forced trade-offs.

Concrete Technique: Status-to-Plan Compaction

One practical pattern is compacting status updates back into plan documents:

# Before (context bloat)
## Plan
- Implement user authentication
- Add database schema
- Create API endpoints
 
## Status Updates
- [14:23] Started auth implementation
- [14:45] Auth working, moving to database
- [15:12] Database schema complete
- [15:30] Schema had bug, fixed
- [15:45] API endpoints half done
...

# After (intentional compaction)
## Plan
- ✓ Implement user authentication — Complete, no issues
- ✓ Add database schema — Complete after fixing validation bug
- ⧗ Create API endpoints — In progress (3/5 complete)
 
## Current Work
Working on remaining API endpoints (POST /users, DELETE /users)

The compacted version preserves correctness (what's done), completeness (including the bug fix), signal (current state), and trajectory (what's next)—all in a fraction of the context.

Why This Works

Proactive = Quality: Compacting at 50% gives you room to optimize. At 95%, you're in triage mode.
Frequent = Fresh: Small, regular compactions are easier to verify than massive emergency summarizations.
Intentional = Controlled: You decide what to preserve based on priority order, not panic.

The Trade-off

Requires active monitoring of context utilization and deliberate intervention. You can't set-and-forget. The investment is ongoing attention; the return is sustained quality.

Real-World Results: HumanLayer's ACE-FCA framework demonstrated this approach shipping 35,000 lines of code in 7 hours. The key wasn't speed—it was maintaining quality through aggressive proactive compaction. When you compact intentionally at 40-60%, you avoid the quality degradation that comes from emergency summarization at 95%.

Contrast with Emergency Compaction

Approach	Trigger Point	Quality Impact	Control
Emergency Auto-Compact	95%+ capacity	Degrades (forced summarization)	Low (automatic)
Frequent Intentional	40-60% capacity	Maintains (controlled compression)	High (deliberate)
No Compaction	Never (boot fresh agents)	High (fresh context)	Highest (manual restart)

Our default advice remains "boot a new agent" for most cases. Frequent intentional compaction is for scenarios where agent continuity matters—long-running sessions, accumulated state, or workflows where restarting is expensive.

When to Use

Multi-hour coding sessions where restarting loses momentum
Workflows that accumulate valuable learned context
Situations where handoff overhead exceeds compaction cost
Teams optimizing for sustained agent performance over time

When to Boot Fresh Instead

Short, focused tasks (< 30 minutes)
Clear task boundaries where restart is natural
Quality concerns outweigh continuity needs
Agent output shows signs of degradation

Federated Knowledge Architecture

[2026-02-06]: Most context strategies assume a single repository or bounded knowledge source. Federated knowledge extends context management to distributed systems—multiple repositories, microservices, external APIs, and community knowledge sources.

The Distributed Context Problem

Scenario: Agent needs to understand authentication flow that spans:

api-gateway repo (entry point)
auth-service repo (token validation)
user-database repo (schema definitions)
Company wiki (design decisions)
Third-party OAuth docs (protocol specs)

Traditional approaches fail:

Loading all repos into context → token budget exhausted
Loading one repo → agent misses cross-repo dependencies
Manual context assembly → error-prone, doesn't scale

How Federated Knowledge Works

1. Knowledge Aggregation

Pull from multiple sources into unified cache:

.bmad-fks-cache/
├── api-gateway/           # Git repo 1
│   └── [cached files]
├── auth-service/          # Git repo 2
│   └── [cached files]
├── design-docs/           # Web pages converted to PDF
│   └── auth-flow.pdf
└── oauth-spec/            # External docs
    └── rfc-6749.pdf

Each source maintains independent cache directory. Updates refresh caches without disrupting other sources.

2. Unified Context Map

Central context.md file maps sources → cached locations:

# Federated Knowledge Sources
 
## Internal Repositories
- **api-gateway**: `.bmad-fks-cache/api-gateway/` - Entry point, routing logic
- **auth-service**: `.bmad-fks-cache/auth-service/` - Token validation, session management
- **user-database**: `.bmad-fks-cache/user-database/` - User schema, migrations
 
## Design Documentation
- **Authentication Flow**: `.bmad-fks-cache/design-docs/auth-flow.pdf` - 2024-11 design decision
- **Session Strategy**: `.bmad-fks-cache/design-docs/session-strategy.pdf` - Redis vs JWT tradeoff
 
## External References
- **OAuth 2.0 RFC**: `.bmad-fks-cache/oauth-spec/rfc-6749.pdf` - Protocol specification

The agent loads context.md first, gaining awareness of all available knowledge sources. Specific sources load on-demand based on task needs.

3. Multi-Source Navigation

Agent workflow:

Read context.md to understand available sources
Identify relevant sources for current task
Load specific files from relevant source caches
Cross-reference between sources when dependencies exist

Example for "implement JWT validation":

Load auth-service/jwt.py (current implementation)
Load oauth-spec/rfc-6749.pdf (protocol requirements)
Load design-docs/auth-flow.pdf (why JWT was chosen)

4. Priority-Based Conflict Resolution

When sources provide conflicting information:

Priority hierarchy:
1. Local directory (current working context)
2. Project-specific repos (company codebases)
3. Organization-wide docs (internal wikis)
4. Community knowledge (external docs, RFCs)

Example conflict:

Local auth.py uses JWT expiry = 1 hour
OAuth RFC recommends 10 minutes
Company wiki says "use 1 hour for mobile clients"

Resolution: Local implementation (priority 1) takes precedence. Agent notes the discrepancy but trusts local context unless explicitly asked to align with external standards.

When to Use Federated Knowledge

Good fit:

Microservices architectures:

Authentication service + API gateway + multiple backends
Event-driven systems with producers/consumers across repos
Shared libraries with separate documentation repos

Multi-repo organizations:

Frontend + backend + infrastructure as separate repos
Platform teams maintaining shared components
Services with cross-cutting concerns (logging, monitoring, auth)

Hybrid internal/external knowledge:

Internal implementation + external protocol specs (OAuth, SAML)
Company code + third-party API documentation
Custom solutions + industry standards

Poor fit:

Single repository projects:

Monolithic applications
Small services without external dependencies
Projects with self-contained documentation

When context already fits:

Simple codebases where everything loads in one context
Well-documented single repos
Projects without cross-repo dependencies

Implementation: BMAD-METHOD Example

BMAD-METHOD implements federated knowledge via external extension: vishalmysore/bmad-federated-knowledge

Configuration example:

# .bmad-fks.yaml
sources:
  - name: api-gateway
    type: git
    url: https://github.com/company/api-gateway
    branch: main
    cache_dir: .bmad-fks-cache/api-gateway
 
  - name: auth-service
    type: git
    url: https://github.com/company/auth-service
    branch: main
    cache_dir: .bmad-fks-cache/auth-service
 
  - name: design-docs
    type: web
    urls:
      - https://wiki.company.com/auth-flow
      - https://wiki.company.com/session-strategy
    cache_dir: .bmad-fks-cache/design-docs
    format: pdf
 
  - name: oauth-spec
    type: web
    urls:
      - https://datatracker.ietf.org/doc/html/rfc6749
    cache_dir: .bmad-fks-cache/oauth-spec
    format: pdf

Workflow:

Developer runs bmad-fks sync - pulls all sources into cache
Agent loads context.md - gains awareness of federated sources
Task-specific context loading - agent reads relevant cached files
Updates propagate independently - each source can refresh without affecting others

Context Loading Pattern for Federated Sources

Progressive disclosure applies:

# Initial load (metadata only, ~50 tokens)
Available sources: api-gateway, auth-service, user-database, design-docs, oauth-spec
 
# On selection (full context, ~2000 tokens per source)
Loading: auth-service
  - jwt.py (implementation)
  - tests/test_jwt.py (test coverage)
  - README.md (setup instructions)

Don't load all sources upfront. Load source summaries, then expand relevant sources on-demand.

Trade-Offs

Approach	Context Budget	Discoverability	Maintenance
Single-repo	Tight (everything visible)	Perfect (grep finds all)	Simple
Manual multi-repo	Varies (per-task assembly)	Poor (what exists?)	High (manual sync)
Federated knowledge	Moderate (metadata + selected)	Good (unified map)	Medium (automated sync)

Federated knowledge wins when:

Multiple repos are inevitable (microservices, org structure)
Context budget can't fit everything (large codebases)
Cross-repo understanding is critical (dependencies, shared concerns)

Avoid when:

Single repo works fine
External dependencies are minimal
Context budget is unconstrained

Integration with Existing Context Strategies

Combines with frequent intentional compaction:

Load federated sources → compact irrelevant sections → proceed with focused context
Compaction at 40-60% still applies, but now across multi-source context

Combines with context loading (vs accumulation):

Federated sources are perfect for curated payload model
Orchestrator stages: "load auth-service for this subagent, api-gateway for that one"
Each agent receives precisely the sources it needs

Enables progressive disclosure:

Tier 1: Source names and descriptions (metadata)
Tier 2: File listings within selected source
Tier 3: Specific file contents from selected source

Token Economics

Example: 5 federated sources, 3 active for current task

Metadata for 5 sources:        5 × 50 tokens =     250 tokens
Context map file:                              +   300 tokens
Selected source 1 (auth-service):              + 2,000 tokens
Selected source 2 (api-gateway):               + 1,500 tokens
Selected source 3 (design-docs):               + 1,000 tokens
──────────────────────────────────────────────────────────────
Total:                                         = 5,050 tokens

Compare to loading all 5 sources: 5 × 2,000 = 10,000 tokens. Federated approach with selective loading saves ~5k tokens.

Practical Patterns

1. Cross-Repo Dependency Analysis

Task: "Update authentication to use refresh tokens"
 
Agent reasoning:
- auth-service: Implements token generation (needs update)
- api-gateway: Validates tokens (may need refresh logic)
- design-docs: Why current design doesn't use refresh tokens (context)
 
Agent loads all three sources, analyzes dependencies, proposes update spanning repos.

2. Protocol Compliance Verification

Task: "Ensure OAuth implementation follows RFC 6749"
 
Agent reasoning:
- oauth-spec: Load RFC 6749 requirements
- auth-service: Load current implementation
- Compare: Identify gaps between spec and implementation
 
Agent cross-references external standard with internal code.

3. Historical Context Recovery

Task: "Why do we use 1-hour JWT expiry instead of 10 minutes?"
 
Agent reasoning:
- design-docs: Load historical design decision (2024-11)
- auth-service: Current implementation confirms 1-hour
- oauth-spec: Protocol recommendation is 10 minutes
 
Agent explains: Company decision prioritized mobile UX over security recommendation.

Open Questions

How to handle version skew between cached sources and live repos?
What's the optimal cache refresh strategy? (on-demand, scheduled, manual)
Can conflict resolution be learned from usage patterns?
How to detect when cross-repo dependencies are missing from federated sources?
Does priority-based resolution cover all conflict scenarios, or are there edge cases?

Connections

To Context Fundamentals: The "One Agent, One Task" principle this technique extends, and Federated knowledge extends "context as payload" to multi-source payloads
To Advanced Context Patterns: How frequent intentional compaction relates to ACE's growing contexts, and Progressive disclosure pattern enables federated source navigation
To Multi-Agent Context: Orchestrators can stage different federated sources for different subagents
To Tool Use: Read tool becomes gateway to federated sources, not just local files
To Patterns: Emergency Context Rewriting anti-pattern demonstrates why reactive compaction fails

Sources

ACE-FCA framework by HumanLayer — Demonstrated in production shipping 35K LOC in 7 hours using proactive compaction at 40-60% utilization
vishalmysore/bmad-federated-knowledge - BMAD extension implementing federated architecture
BMAD-METHOD Scout Report - Detailed analysis including federated knowledge section
Ehrlich, C. & Blackman, T. (2026). "LCM: Lossless Context Management." Voltropy PBC. arXiv:submit/7269166 — Deterministic context architecture with lossless compaction, benchmarked against Claude Code on OOLONG