Design as Bottleneck | Agentic Engineering

When 20-30 agents churn through implementation in minutes, the constraint moves upstream. Design, architecture, and decomposition become the scarce resources. This section presents five mental models for thinking about agentic systems at scale—each grounded in production evidence from multi-agent systems like Gas Town.

Core Questions

Constraint Identification

Where does the bottleneck actually sit when implementation is automated?
What happens to traditional software roles when coding is no longer the slow part?
How does Theory of Constraints apply to agentic workflows?

Execution Models

Are agents conversational partners or execution units?
What execution contract eliminates idle-agent overhead?
How does persistent identity coexist with ephemeral sessions?

Scale and Accountability

When does a "workshop" become a "factory floor"?
What infrastructure appears at each scale transition?
How does attribution change system behavior over time?

Model 1: Design as Bottleneck

When implementation is automated, design becomes the constraint.

The Core Idea

The Theory of Constraints (Goldratt, 1984) states that every system has exactly one bottleneck limiting throughput. In traditional software development, that bottleneck is often implementation—writing, testing, and debugging code. Developers spend weeks translating designs into working software.

Multi-agent systems invert this. A swarm of 20-30 agents can implement a well-decomposed feature set in minutes. The implementation phase, once the bottleneck, becomes nearly instantaneous relative to design. The constraint moves upstream.

Traditional Software Development:

Design          Implementation          Testing
  │                  │                    │
  ▼                  ▼                    ▼
┌──────┐     ┌──────────────┐      ┌──────────┐
│ Fast │────►│   BOTTLENECK │─────►│ Moderate │
│ 2 hrs│     │   2 weeks    │      │  3 days  │
└──────┘     └──────────────┘      └──────────┘


Multi-Agent Development:

Design              Implementation          Testing
  │                      │                    │
  ▼                      ▼                    ▼
┌──────────────┐   ┌──────────┐        ┌──────────┐
│  BOTTLENECK  │──►│   Fast   │───────►│ Moderate │
│   2-4 hours  │   │ 5 minutes│        │  30 min  │
└──────────────┘   └──────────┘        └──────────┘

The implementation rectangle shrank by orders of magnitude. But design didn't shrink at all—it may even have grown, because now the specification must be precise enough for agents to execute without ambiguity.

What This Replaces

Old Model	New Model
"The hard part is coding"	"The hard part is decomposition"
Invest in faster typing, better IDEs	Invest in specification quality, architectural thinking
Developer productivity = lines/hour	Developer productivity = specifications/hour
Junior devs bottleneck on syntax	Junior devs bottleneck on design clarity
"Ship faster" = more developers	"Ship faster" = better decomposition

Why This Happens

Three properties of multi-agent implementation create the shift:

Parallelism eliminates serial implementation time. Twenty agents working simultaneously reduce wall-clock time by 10-20x compared to a single developer, even accounting for coordination overhead.
Agents don't need ramp-up time. A human developer joining a feature spends hours understanding context. An agent receives its specification and begins immediately.
Agent throughput scales with specification quality. Vague specifications produce incorrect implementations that require rework. Precise specifications produce correct implementations on the first pass. The specification is the leverage point.

Evidence

[2026-02-11]: Gas Town (multi-agent orchestration platform) users report that issue decomposition quality directly determines swarm output quality. The system churns through implementation so quickly that design and planning become the dominant time cost. Teams that invest in decomposition skills consistently outperform those optimizing implementation speed.

Implications for Practitioners

Invest in decomposition skills. The ability to break complex problems into well-scoped, independent units is the highest-leverage skill in agentic systems.
Treat specifications as the primary artifact. See Specs as Source Code—specifications are not throwaway scaffolding; they are the program.
Measure design throughput, not implementation throughput. Track how quickly well-formed specifications emerge, not how quickly agents write code.
Front-load architectural decisions. Ambiguity in architecture creates cascading failures across parallel agents. Resolve architectural questions before spawning the swarm.

Model 2: Agents as Pistons

Agents are not conversational partners. They are pistons in an engine.

The Core Idea

Most mental models for AI agents draw from human interaction: agents as assistants, collaborators, or team members. These metaphors import conversational expectations—agents should acknowledge tasks, negotiate scope, report readiness, and confirm understanding.

The piston model discards all of this. A piston fires when compressed gas expands. It doesn't confirm, negotiate, or report status. The mechanical contract is the assignment: compression happens, the piston fires.

Conversational Model:              Piston Model:

Orchestrator: "Can you do X?"      ┌─────────────┐
Agent: "Let me check..."           │   Hook fires │
Agent: "Yes, I can do X"           │      │       │
Orchestrator: "Please proceed"     │      ▼       │
Agent: "Starting X now..."         │   Agent runs │
Agent: "X is 50% complete..."      │      │       │
Agent: "X is done"                 │      ▼       │
                                   │   Work done  │
  7 messages                       └─────────────┘
  ~3,500 tokens overhead
                                     0 messages
                                     0 tokens overhead

The Execution Contract

The piston model replaces conversation with a contract: if there is work on your hook, you must run it. No waiting for confirmation. No polling for instructions. No announcing readiness and idling.

This contract has three properties:

The hook IS the assignment. Placing work on an agent's hook is the complete instruction. The agent does not need to be "told" to start—the presence of work is the trigger.
Execution is immediate. There is no negotiation phase. The agent begins work the moment it detects its hook has fired.
Completion is the only signal. The agent communicates exactly once: when the work is done. No progress updates, no status checks, no mid-task conversations.

What This Replaces

Conversational Model	Piston Model
Agents announce readiness	Agents are always ready
Orchestrator assigns tasks via messages	Hook fires, agent runs
Progress updates flow continuously	Completion is the only signal
Agents negotiate scope	Scope is defined by the hook payload
Idle agents wait for instructions	No idle state exists
Multi-turn assignment protocol	Zero-turn assignment

The Idle Agent Anti-Pattern

The conversational model creates a characteristic failure: agents that announce readiness and wait. An agent that says "I'm ready for my next task" is consuming resources while producing nothing. Worse, it creates a coordination dependency—something must respond to the readiness announcement.

Idle Agent Anti-Pattern:

Agent A: "Ready for work"         ─── idle ───
                                              │
Orchestrator: "Here's task 3"     ◄───────────┘
                                              │
Agent A: "Starting task 3"        ─── idle ───
                                              │
Agent A: [actual work begins]     ◄───────────┘

Time wasted: orchestrator latency + 2 message round trips

In the piston model, this scenario cannot occur. There is no "ready" state. There is only "working" or "not instantiated."

When This Model Applies

Good fit:

High-throughput systems with many agents
Tasks with clear, complete specifications
Workflows where coordination overhead dominates
Systems where agent count exceeds 5-10

Poor fit:

Exploratory tasks requiring clarification
Creative work needing iterative refinement
Tasks where the specification is genuinely incomplete
Single-agent interactions where conversation is natural

Evidence

[2026-02-11]: Gas Town's architecture embodies the piston model through its "propulsion principle"—GUPP (Gas Town Universal Propulsion Principle): if there is work on your hook, you must run it. This principle drives the entire system architecture, eliminating idle-agent overhead and enabling predictable throughput scaling. The absence of negotiation reduces per-task token overhead to near zero.

Model 3: Persistent Identity, Ephemeral Execution

Agents are like employees—permanent identity, but each workday is fresh.

The Core Idea

Two failing models dominate thinking about agent state:

Persistent sessions that accumulate context until they bloat and degrade. The agent "remembers everything" but drowns in irrelevant history.
Fully stateless agents that start fresh every time. The agent "forgets everything" and repeats mistakes, rediscovers solutions, and cannot learn.

The middle path separates who from how:

┌─────────────────────────────────────────────────────┐
│                PERSISTENT IDENTITY                   │
│                                                      │
│  Name: Agent-7                                       │
│  Role: Security Reviewer                             │
│  Skills: [auth, crypto, input-validation]            │
│  Track Record: 47 reviews, 3 critical finds          │
│  Known Patterns: prefers OWASP checklist approach    │
│                                                      │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐           │
│  │Session 1 │  │Session 2 │  │Session 3 │  ...      │
│  │          │  │          │  │          │           │
│  │ Fresh    │  │ Fresh    │  │ Fresh    │           │
│  │ context  │  │ context  │  │ context  │           │
│  │ window   │  │ window   │  │ window   │           │
│  │          │  │          │  │          │           │
│  │ Dispose  │  │ Dispose  │  │ Dispose  │           │
│  │ after    │  │ after    │  │ after    │           │
│  └──────────┘  └──────────┘  └──────────┘           │
│                                                      │
│             EPHEMERAL EXECUTION                      │
└─────────────────────────────────────────────────────┘

The persistent layer holds identity, skills, track record, and accumulated expertise. The ephemeral layer holds the current session's context window—tools, files, and working memory for the immediate task. Sessions are disposable. Identity accumulates.

The Three Layers

Layer	Persists Across Sessions	Contains	Size
Identity	Always	Name, role, skills, configuration	Small (~500 tokens)
History	Always (append-only)	Past session summaries, decisions, outcomes	Growing (compressed)
Session	Never	Current context, working files, tool state	Large (fills context window)

Seancing: Querying the Past

When an agent needs context from a prior session, it performs what Gas Town calls seancing—querying past session records for decisions and rationale. This is analogous to institutional memory in organizations: no employee remembers every meeting, but meeting notes make past decisions retrievable.

Current Session:

Agent: "What authentication approach did we choose for the API?"

  │
  ▼  [searches session history]

Session 12 Summary:
  "Chose JWT with RS256 over session cookies.
   Rationale: stateless scaling, mobile client support.
   Decision maker: Architecture-Agent-3."

  │
  ▼

Agent: [proceeds with JWT context loaded]

Seancing differs from loading full session history. It is targeted retrieval—searching for specific decisions, patterns, or outcomes rather than replaying everything.

What This Replaces

Failing Model	Problem	Middle Path
Persistent sessions	Context bloat, degraded performance after 10+ interactions	Sessions are ephemeral; dispose after task
Stateless agents	Cannot learn, repeats mistakes, no institutional memory	Identity and history persist; sessions query them
Full history replay	Prohibitive token cost, irrelevant context dilution	Seancing retrieves targeted decisions only

Identity Accumulation in Practice

The persistent identity layer grows through structured updates:

# agent-7-cv.yaml (persists across all sessions)
name: Agent-7
role: security-reviewer
sessions_completed: 47
specializations:
  - authentication-flows
  - input-validation
  - cryptographic-implementations
notable_outcomes:
  - session_12: "Identified JWT key rotation gap"
  - session_31: "Caught SQL injection in parameterized query"
  - session_45: "Flagged timing attack in comparison logic"
patterns_learned:
  - "Check revocation lists when JWT validation present"
  - "Verify constant-time comparison for all secret comparisons"

Each session adds to the CV. New sessions restore context by reading the CV, not by replaying old sessions. The identity grows richer while each session starts clean.

Evidence

[2026-02-11]: Gas Town implements this model through "polecats"—agents with persistent CVs that accumulate track records across sessions. New sessions restore context via hook-based initialization that loads identity and queries relevant history. The CV structure enables capability-based routing: the orchestrator assigns security reviews to agents with demonstrated security expertise, not to the next available agent.

Implications for Practitioners

Design identity schemas early. What persists across sessions determines what agents can learn. Invest in the structure of the persistent layer.
Make sessions disposable by default. The temptation to preserve session state "in case it's useful" leads to bloat. Default to disposal; query when needed.
Build seancing into workflows. Agents should actively query past decisions when entering domains they've worked in before. This is not automatic—it requires explicit tooling.
Use CVs for routing. Agent identity enables intelligent task assignment. An orchestrator that knows Agent-7 found three critical security issues routes security-sensitive work to Agent-7.

Model 4: Work as Ledger

Every agent action is a timestamped entry in a permanent record.

The Core Idea

Most agent systems treat execution as ephemeral—work happens, results appear, the process vanishes. The ledger model treats execution as accounting: every action, decision, and outcome is a timestamped, attributed entry in a permanent record.

Fire-and-Forget Model:            Ledger Model:

Task → Agent → Result             Task → Agent → Result
                                        │
                                        ▼
                                   ┌──────────────────────┐
                                   │ LEDGER               │
                                   │                      │
                                   │ 14:23 Agent-7 READ   │
                                   │   auth.py (247 lines)│
                                   │                      │
                                   │ 14:24 Agent-7 FOUND  │
                                   │   JWT validation gap │
                                   │   severity: critical │
                                   │                      │
                                   │ 14:25 Agent-7 WROTE  │
                                   │   fix.patch (12 lines)│
                                   │   model: opus-4      │
                                   │   tokens: 3,847      │
                                   │                      │
                                   │ 14:26 Agent-7 DONE   │
                                   │   result: success    │
                                   └──────────────────────┘

What the Ledger Enables

Attribution is not a compliance afterthought—it is system intelligence. The ledger enables capabilities that fire-and-forget systems cannot support:

Capability	How the Ledger Enables It
Debugging	Trace failures to specific agent actions and decisions
Capability routing	Route tasks to agents with proven track records
Performance management	Identify agents that consistently produce quality vs. rework
Model comparison	Compare output quality across models (opus vs. sonnet) for the same task type
Cost attribution	Know which agents and tasks consume the most tokens
Regression detection	Detect when agent performance degrades over time

The Batch-Closure Heresy

One anti-pattern threatens ledger integrity: closing entries retroactively. When an agent marks work as complete after the fact—or worse, when a supervisor closes entries on behalf of agents—the ledger loses its causal ordering.

Correct Ledger:                    Corrupted Ledger:

14:23 START task-42                14:23 START task-42
14:25 READ auth.py                 14:50 [BATCH CLOSE]
14:27 FOUND vulnerability            - READ auth.py     ← no timestamp
14:28 WROTE fix.patch                - FOUND vuln       ← retroactive
14:29 COMPLETE task-42                - WROTE fix        ← rewritten
                                      - COMPLETE task-42 ← bundled

Causal order preserved.            Causal order destroyed.
Debugging: straightforward.        Debugging: impossible.

Batch closure is tempting because it reduces logging overhead. But it destroys the property that makes ledgers valuable: the ability to reconstruct exactly what happened, in what order, and why.

Ledger vs. Logging

The ledger model is not simply "add logging." Logs are diagnostic artifacts—searched when something breaks. The ledger is an operational artifact—queried continuously for routing, attribution, and improvement.

Property	Logging	Ledger
Purpose	Diagnosis after failure	Continuous system intelligence
Queried	When something breaks	Every task assignment, every review
Retention	Rotated, compressed, archived	Permanent, append-only
Structure	Free-form text	Structured entries with agent, action, outcome
Attribution	Optional	Required for every entry
Consumers	Humans debugging	Orchestrators, agents, and humans

When This Model Applies

Good fit:

Multi-agent systems where multiple agents contribute to outcomes
Systems requiring auditability or compliance
Long-running projects where agent performance must improve over time
Environments comparing multiple models or agent configurations

Poor fit:

Single-agent, single-session interactions
Prototyping where overhead must be minimal
Tasks where the process is less important than the result
Ephemeral scripts and one-off automation

Evidence

[2026-02-11]: Gas Town implements the ledger model through its "Bead" system—a permanent work record where every agent action is a timestamped, attributed entry. The Bead system enables capability routing (assigning tasks based on past performance), regression detection (flagging agents whose quality has declined), and cost attribution (tracking token spend per agent and task type). The "Batch-Closure Heresy"—closing entries retroactively—is explicitly forbidden because it corrupts the causal ordering that makes the ledger useful.

Implications for Practitioners

Design attribution into the system from day one. Retrofitting attribution onto an existing system is orders of magnitude harder than building it in.
Make the ledger queryable, not just writable. A ledger that can only be read by humans is a log. A ledger that orchestrators query for routing decisions is system intelligence.
Enforce causal ordering. Every entry must have a timestamp and an agent attribution. Batch closures destroy the causal structure.
Use the ledger for routing. Agent assignment should consider past performance on similar tasks, not just availability.

Model 5: Factory Floor vs. Workshop

The transition from 3 agents to 30 is not gradual—it is a phase change.

The Core Idea

Two distinct models exist for organizing agentic work:

WORKSHOP (1-3 agents):            FACTORY FLOOR (10-30 agents):

┌─────────────────────┐           ┌──────────────────────────────────┐
│                     │           │  ┌──────────┐   ┌──────────┐    │
│  Human ◄──► Agent   │           │  │Supervisor│   │  Health   │    │
│         ◄──► Agent  │           │  │  Agent   │   │ Monitor  │    │
│                     │           │  └─────┬────┘   └──────────┘    │
│  Direct management  │           │        │                         │
│  Conversational     │           │  ┌─────┼──────────────────┐     │
│  Each agent visible │           │  │     │   Merge Queue    │     │
│                     │           │  │  ┌──┴──┐ ┌─────┐ ┌───┐│     │
└─────────────────────┘           │  │  │A│B│C│ │D│E│F│ │...││     │
                                  │  │  └─────┘ └─────┘ └───┘│     │
                                  │  └────────────────────────┘     │
                                  │                                  │
                                  │  Infrastructure required:        │
                                  │  - Supervisors                   │
                                  │  - Merge queues                  │
                                  │  - Health monitoring              │
                                  │  - Conflict resolution           │
                                  │  - Work attribution              │
                                  └──────────────────────────────────┘

The workshop is intimate. A human directly manages each agent, sees all output, and handles coordination through conversation. This works well for 1-3 agents—the cognitive load is manageable, and the overhead of infrastructure is not justified.

The factory floor is industrial. No human can directly manage 20+ agents. Infrastructure replaces direct management: supervisors route work, merge queues prevent conflicts, health monitors detect failures, and attribution tracks accountability.

The Phase Change

The transition between these models is not gradual. It is a phase change—a discontinuity where the old model stops working and new infrastructure becomes necessary.

Agent Count:    1    3    5    8    10   15   20   30
                │    │    │    │    │    │    │    │
Workshop:       ████████████████░░░░░░░░░░░░░░░░░░
                              ▲
                              │ Phase transition
                              │ (chaos zone)
                              ▼
Factory:        ░░░░░░░░░░░░░░████████████████████

Infrastructure needed at each level:

1-3 agents:   Nothing. Direct management works.
4-7 agents:   Merge strategy. File conflicts appear.
8-12 agents:  Supervisors. Humans cannot track all agents.
13-20 agents: Health monitoring. Silent failures occur.
20+ agents:   Full infrastructure. Attribution, queues,
              conflict resolution, automated routing.

Scale Infrastructure Table

Agent Count	Infrastructure Required	Why
1-3	None	Human directly manages each agent
4-7	Merge strategy	Multiple agents editing overlapping files
8-12	Supervisors	Human cannot track status of all agents simultaneously
13-20	Health monitoring	Silent agent failures go undetected without automated checks
20-30	Full orchestration	Attribution, queuing, conflict resolution, automated routing
30+	Hierarchical management	Single orchestrator cannot manage all agents; sub-orchestrators needed

Two Failure Modes

Premature Factory Infrastructure: Building supervisor agents, merge queues, and health monitors for a 2-agent system. The overhead exceeds the value. Simple direct management works better. The infrastructure becomes complexity without benefit—bureaucracy for a team of two.

Delayed Factory Infrastructure: Running 15 agents with workshop-style direct management. Merge conflicts multiply, silent failures go undetected, and the human becomes the bottleneck—unable to review all output, missing critical issues, and drowning in coordination overhead. The system appears to work but produces lower quality than a well-managed 5-agent workshop.

Steve Yegge's Eight Levels

Steve Yegge's framework for AI-assisted development describes a spectrum from Level 1 (AI as autocomplete) to Level 8 (AI-first development). Most practitioners operate at Levels 1-4 (workshop). Factory-floor systems like Gas Town target Levels 7-8, where the development model fundamentally changes.

Level Range	Mode	Agent Relationship
1-2	Autocomplete, chat	AI assists human coding
3-4	Pair programming, delegation	AI handles defined subtasks
5-6	Supervised autonomy	AI works independently with checkpoints
7-8	Factory floor	Human designs, AI swarm implements

The levels are not a maturity model where everyone should reach Level 8. They describe different modes suitable for different contexts. Most tasks are well-served at Levels 3-4. Factory-floor orchestration is necessary only when the task scope genuinely demands 10+ concurrent agents.

What This Replaces

Workshop Assumption	Factory Floor Reality
"Add more agents to go faster"	More agents without infrastructure causes chaos
"I can manage 10 agents like I manage 2"	Cognitive load grows superlinearly with agent count
"Infrastructure is premature optimization"	Infrastructure becomes necessary at ~8 agents
"Scaling is linear"	Scaling has phase transitions requiring new architecture

Evidence

[2026-02-11]: Gas Town's entire architecture is factory-floor infrastructure—supervisors, health monitoring, merge queues, work attribution, and automated routing. The system targets Yegge's Level 7-8 operations, where 20-30 agents execute in parallel. Teams that attempt Gas Town's scale without its infrastructure (or equivalent) consistently encounter the chaos zone: merge conflicts, silent failures, and human bottlenecks. The architecture itself is evidence that the phase transition is real and requires deliberate infrastructure investment.

When to Transition

The decision to move from workshop to factory is not about ambition—it is about observable symptoms:

Transition Signals (Workshop → Factory):

  Merge conflicts appearing?
  ├── No → Stay in workshop
  └── Yes
       └── Agent failures going undetected?
            ├── No → Add merge strategy only
            └── Yes
                 └── Human overwhelmed by review volume?
                      ├── No → Add health monitoring
                      └── Yes → Build factory infrastructure

Each symptom maps to specific infrastructure. Build only what the symptoms demand—premature infrastructure is its own failure mode.

The Models in Combination

These five models form an interlocking system. Each addresses a different aspect of multi-agent work at scale:

┌────────────────────────────────────────────────────────┐
│                                                        │
│  DESIGN AS BOTTLENECK                                  │
│  (what to invest in)                                   │
│         │                                              │
│         ▼                                              │
│  ┌──────────────┐     ┌────────────────────────┐       │
│  │ AGENTS AS    │     │ PERSISTENT IDENTITY,   │       │
│  │ PISTONS      │     │ EPHEMERAL EXECUTION    │       │
│  │              │     │                        │       │
│  │ (how agents  │     │ (how agents persist    │       │
│  │  execute)    │     │  across sessions)      │       │
│  └──────┬───────┘     └───────────┬────────────┘       │
│         │                         │                    │
│         └──────────┬──────────────┘                    │
│                    │                                   │
│                    ▼                                   │
│         ┌──────────────────┐                           │
│         │ WORK AS LEDGER   │                           │
│         │                  │                           │
│         │ (how work is     │                           │
│         │  tracked)        │                           │
│         └────────┬─────────┘                           │
│                  │                                     │
│                  ▼                                     │
│         ┌──────────────────┐                           │
│         │ FACTORY FLOOR    │                           │
│         │ vs. WORKSHOP     │                           │
│         │                  │                           │
│         │ (what scale      │                           │
│         │  demands)        │                           │
│         └──────────────────┘                           │
│                                                        │
└────────────────────────────────────────────────────────┘

Design as Bottleneck identifies where to invest: upstream, in decomposition and specification.

Agents as Pistons defines the execution contract: fire-and-forget, no conversation overhead.

Persistent Identity, Ephemeral Execution solves the state management problem: agents learn without bloating.

Work as Ledger provides accountability: every action is recorded, attributed, and queryable.

Factory Floor vs. Workshop determines what infrastructure these models require at each scale level.

Each model is useful independently. Together, they describe a coherent philosophy for building multi-agent systems that scale.

Anti-Patterns

Conversational Orchestration at Scale

The mistake: Using multi-turn message exchanges to assign work to 15+ agents. "Agent-3, are you available? Great, here's task 7. Agent-3, how's it going? Agent-3, are you done yet?"

Why it fails: Orchestration overhead grows quadratically with agent count. At 20 agents, the orchestrator spends more tokens coordinating than agents spend working.

The fix: Piston model. Hook fires, agent runs, completion signal returns. Zero conversational overhead.

Stateless Swarms

The mistake: Spawning fresh, identity-free agents for every task. No history, no expertise accumulation, no capability routing.

Why it fails: The system cannot improve. Agent-7's excellent security review in Session 12 is invisible to Session 13's task assignment. Every task is assigned to a random agent regardless of fit.

The fix: Persistent identity with ephemeral execution. CVs accumulate track records. Orchestrators query CVs for routing.

Fire-and-Forget Without Attribution

The mistake: Agents produce outputs but the system does not record who did what, when, or how. Results appear but the process is opaque.

Why it fails: Debugging requires replaying entire sessions. Performance improvement is impossible because there is no data. Cost optimization is guesswork because token spend is not attributed.

The fix: Ledger model. Every action is a timestamped, attributed entry. The ledger is queryable by orchestrators, not just searchable by humans.

Workshop Mentality at Factory Scale

The mistake: A human attempting to directly manage 20 agents the same way they manage 2. Reading every output, making every routing decision, manually resolving every conflict.

Why it fails: The human becomes the bottleneck. Agent throughput is limited by human review speed. The entire point of scaling—parallel throughput—is negated by serial human review.

The fix: Factory floor infrastructure. Supervisors, health monitoring, merge queues, automated routing. The human designs and reviews; infrastructure manages execution.

Open Questions

Where is the next bottleneck after design? If design automation improves (agents generating specifications from high-level intent), what becomes the new constraint? Requirements? Strategy? Taste?
Can the piston model support creative work? Creative tasks may require the conversational back-and-forth that the piston model eliminates. Is there a hybrid model that preserves execution efficiency for implementation while allowing conversation for creative exploration?
What is the optimal CV structure? Too detailed and the CV consumes excessive context. Too sparse and routing decisions lack signal. What information density maximizes routing quality per token?
How large can ledgers grow before they become unwieldy? Append-only ledgers grow without bound. What compression, summarization, or archival strategies maintain queryability without unbounded growth?
Is the phase transition predictable? The workshop-to-factory transition appears to occur around 8-12 agents, but this number likely varies by domain and task complexity. Can the transition point be predicted from task characteristics?
What happens above 100 agents? Gas Town operates at 20-30 agents. Model-native swarm systems (e.g., Kimi K2.5) demonstrate 100+ concurrent subagents. Does factory-floor infrastructure scale to this level, or does a third organizational model emerge?
How does the ledger interact with privacy constraints? In regulated environments, permanent attribution records may conflict with data retention policies. How does the ledger model adapt to privacy requirements?

Connections

Execution Topologies: The five topologies describe shapes of agent work; these models describe principles for operating agent systems at scale. Factory floor infrastructure enables wider and deeper topologies. The piston model reduces friction in the measurement framework.
Specs as Source Code: Design as Bottleneck is the logical consequence of treating specs as source code. If specifications are the primary programming surface, and implementation is automated, then specification quality is the constraint.
Pit of Success: The piston model is a pit-of-success design for agent execution. The easiest path (hook fires, agent runs) is also the correct path. No negotiation means no negotiation failures.
Self-Improving Experts: Persistent Identity, Ephemeral Execution is the mental model behind self-improving experts. The pattern (expertise files, session learning) implements the model (persistent identity, ephemeral sessions).
Orchestrator Pattern: Factory floor infrastructure is what orchestrator patterns become at scale. The orchestrator pattern describes coordination; Factory Floor vs. Workshop describes when coordination requires dedicated infrastructure.
Expert Swarm Pattern: Expert swarms operate at factory-floor scale. The swarm pattern implements the piston model (agents as execution units) and the ledger model (work attribution across swarm members).
Workflow Coordination: Operational implementation of the infrastructure that factory-floor scale demands—merge strategies, health monitoring, and conflict resolution.
Context Fundamentals: Ephemeral execution is a context management strategy. Disposing sessions and loading identity via CV keeps context windows fresh and relevant.
Operating Agent Swarms: Operational practices for when the factory floor is running—cost management, incident response, and scale transitions that emerge once design bottlenecks are addressed.

Sources

Eliyahu M. Goldratt, The Goal (1984) — Theory of Constraints: every system has exactly one bottleneck limiting throughput
Gas Town multi-agent orchestration platform — production evidence for piston model, persistent identity, ledger system, and factory-floor infrastructure
Steve Yegge, "Eight Levels of AI-Assisted Development" — framework for classifying human-AI development modes
Rico Mariani, .NET Framework design — original "Pit of Success" concept, applied here to agent execution contracts