When 20-30 agents churn through implementation in minutes, the constraint moves upstream. Design, architecture, and decomposition become the scarce resources. This section presents five mental models for thinking about agentic systems at scale—each grounded in production evidence from multi-agent systems like Gas Town.
Core Questions
Constraint Identification
- Where does the bottleneck actually sit when implementation is automated?
- What happens to traditional software roles when coding is no longer the slow part?
- How does Theory of Constraints apply to agentic workflows?
Execution Models
- Are agents conversational partners or execution units?
- What execution contract eliminates idle-agent overhead?
- How does persistent identity coexist with ephemeral sessions?
Scale and Accountability
- When does a "workshop" become a "factory floor"?
- What infrastructure appears at each scale transition?
- How does attribution change system behavior over time?
Model 1: Design as Bottleneck
When implementation is automated, design becomes the constraint.
The Core Idea
The Theory of Constraints (Goldratt, 1984) states that every system has exactly one bottleneck limiting throughput. In traditional software development, that bottleneck is often implementation—writing, testing, and debugging code. Developers spend weeks translating designs into working software.
Multi-agent systems invert this. A swarm of 20-30 agents can implement a well-decomposed feature set in minutes. The implementation phase, once the bottleneck, becomes nearly instantaneous relative to design. The constraint moves upstream.
Traditional Software Development:
Design Implementation Testing
│ │ │
▼ ▼ ▼
┌──────┐ ┌──────────────┐ ┌──────────┐
│ Fast │────►│ BOTTLENECK │─────►│ Moderate │
│ 2 hrs│ │ 2 weeks │ │ 3 days │
└──────┘ └──────────────┘ └──────────┘
Multi-Agent Development:
Design Implementation Testing
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────┐ ┌──────────┐
│ BOTTLENECK │──►│ Fast │───────►│ Moderate │
│ 2-4 hours │ │ 5 minutes│ │ 30 min │
└──────────────┘ └──────────┘ └──────────┘
The implementation rectangle shrank by orders of magnitude. But design didn't shrink at all—it may even have grown, because now the specification must be precise enough for agents to execute without ambiguity.
What This Replaces
| Old Model | New Model |
|---|---|
| "The hard part is coding" | "The hard part is decomposition" |
| Invest in faster typing, better IDEs | Invest in specification quality, architectural thinking |
| Developer productivity = lines/hour | Developer productivity = specifications/hour |
| Junior devs bottleneck on syntax | Junior devs bottleneck on design clarity |
| "Ship faster" = more developers | "Ship faster" = better decomposition |
Why This Happens
Three properties of multi-agent implementation create the shift:
-
Parallelism eliminates serial implementation time. Twenty agents working simultaneously reduce wall-clock time by 10-20x compared to a single developer, even accounting for coordination overhead.
-
Agents don't need ramp-up time. A human developer joining a feature spends hours understanding context. An agent receives its specification and begins immediately.
-
Agent throughput scales with specification quality. Vague specifications produce incorrect implementations that require rework. Precise specifications produce correct implementations on the first pass. The specification is the leverage point.
Evidence
[2026-02-11]: Gas Town (multi-agent orchestration platform) users report that issue decomposition quality directly determines swarm output quality. The system churns through implementation so quickly that design and planning become the dominant time cost. Teams that invest in decomposition skills consistently outperform those optimizing implementation speed.
[2026-04-11]: Ethan Mollick's practitioner-research synthesis reaches the same conclusion from a different evidentiary tradition. Drawing on the BCG 758-consultant study, METR Long Tasks benchmarks, and the StrongDM software factory case, Mollick argues that in agent-managed organizations, "the window to shape the Thing" — the design and organizational configuration work done before agents execute — is the scarce practitioner resource (Mollick, "The Shape of the Thing," 2026). The book's Model 1 derives this from Theory of Constraints and Gas Town production data (engineering and production-observation traditions). Mollick derives it from organizational research and team-scale production cases. Two independent evidentiary paths reaching identical structural conclusions increase confidence in the model's generality.
Source: Mollick, E. "The Shape of the Thing." One Useful Thing, ~2026-03-12. https://www.oneusefulthing.org/p/the-shape-of-the-thing
Implications for Practitioners
- Invest in decomposition skills. The ability to break complex problems into well-scoped, independent units is the highest-leverage skill in agentic systems.
- Treat specifications as the primary artifact. See Specs as Source Code—specifications are not throwaway scaffolding; they are the program.
- Measure design throughput, not implementation throughput. Track how quickly well-formed specifications emerge, not how quickly agents write code.
- Front-load architectural decisions. Ambiguity in architecture creates cascading failures across parallel agents. Resolve architectural questions before spawning the swarm.
A Tension to Hold
The factory metaphor that runs through this chapter (and through Software Factories) implies a stable equilibrium: once workflow infrastructure is in place, design is the ongoing human contribution and execution is reliably delegated.
Mollick's 2026 article complicates this. Mollick treats recursive self-improvement (RSI) — AI systems building better AI systems — as an explicit destabilizing factor. RSI is on major lab roadmaps. If the capability profile of the execution layer is improving on a timescale shorter than organizational design cycles, the factory metaphor's stability assumption is challenged: the design that suffices today may be obsolete before the factory runs it to completion. Organizations cannot design once and then operate; the target is moving.
This does not dissolve the design-as-bottleneck framing — design remains the constraint regardless of whether the execution layer is stable or improving. But it challenges the inference that practitioners can invest in a stable harness architecture and return to it. If execution capabilities are shifting, institutional design (see Workflow Coordination §7.5) may need to be treated as continuous rather than one-time. The "organizational experimentation urgency" Mollick prescribes is partly a response to RSI: experiment now, before the target has moved further.
Neither the factory metaphor nor the RSI instability view can be dismissed. This remains an open tension in the field.
Model 2: Agents as Pistons
Agents are not conversational partners. They are pistons in an engine.
The Core Idea
Most mental models for AI agents draw from human interaction: agents as assistants, collaborators, or team members. These metaphors import conversational expectations—agents should acknowledge tasks, negotiate scope, report readiness, and confirm understanding.
The piston model discards all of this. A piston fires when compressed gas expands. It doesn't confirm, negotiate, or report status. The mechanical contract is the assignment: compression happens, the piston fires.
Conversational Model: Piston Model:
Orchestrator: "Can you do X?" ┌─────────────┐
Agent: "Let me check..." │ Hook fires │
Agent: "Yes, I can do X" │ │ │
Orchestrator: "Please proceed" │ ▼ │
Agent: "Starting X now..." │ Agent runs │
Agent: "X is 50% complete..." │ │ │
Agent: "X is done" │ ▼ │
│ Work done │
7 messages └─────────────┘
~3,500 tokens overhead
0 messages
0 tokens overhead
The Execution Contract
The piston model replaces conversation with a contract: if there is work on your hook, you must run it. No waiting for confirmation. No polling for instructions. No announcing readiness and idling.
This contract has three properties:
-
The hook IS the assignment. Placing work on an agent's hook is the complete instruction. The agent does not need to be "told" to start—the presence of work is the trigger.
-
Execution is immediate. There is no negotiation phase. The agent begins work the moment it detects its hook has fired.
-
Completion is the only signal. The agent communicates exactly once: when the work is done. No progress updates, no status checks, no mid-task conversations.
What This Replaces
| Conversational Model | Piston Model |
|---|---|
| Agents announce readiness | Agents are always ready |
| Orchestrator assigns tasks via messages | Hook fires, agent runs |
| Progress updates flow continuously | Completion is the only signal |
| Agents negotiate scope | Scope is defined by the hook payload |
| Idle agents wait for instructions | No idle state exists |
| Multi-turn assignment protocol | Zero-turn assignment |
The Idle Agent Anti-Pattern
The conversational model creates a characteristic failure: agents that announce readiness and wait. An agent that says "I'm ready for my next task" is consuming resources while producing nothing. Worse, it creates a coordination dependency—something must respond to the readiness announcement.
Idle Agent Anti-Pattern:
Agent A: "Ready for work" ─── idle ───
│
Orchestrator: "Here's task 3" ◄───────────┘
│
Agent A: "Starting task 3" ─── idle ───
│
Agent A: [actual work begins] ◄───────────┘
Time wasted: orchestrator latency + 2 message round trips
In the piston model, this scenario cannot occur. There is no "ready" state. There is only "working" or "not instantiated."
When This Model Applies
Good fit:
- High-throughput systems with many agents
- Tasks with clear, complete specifications
- Workflows where coordination overhead dominates
- Systems where agent count exceeds 5-10
Poor fit:
- Exploratory tasks requiring clarification
- Creative work needing iterative refinement
- Tasks where the specification is genuinely incomplete
- Single-agent interactions where conversation is natural
Evidence
[2026-02-11]: Gas Town's architecture embodies the piston model through its "propulsion principle"—GUPP (Gas Town Universal Propulsion Principle): if there is work on your hook, you must run it. This principle drives the entire system architecture, eliminating idle-agent overhead and enabling predictable throughput scaling. The absence of negotiation reduces per-task token overhead to near zero.
Model 3: Persistent Identity, Ephemeral Execution
Agents are like employees—permanent identity, but each workday is fresh.
The Core Idea
Two failing models dominate thinking about agent state:
- Persistent sessions that accumulate context until they bloat and degrade. The agent "remembers everything" but drowns in irrelevant history.
- Fully stateless agents that start fresh every time. The agent "forgets everything" and repeats mistakes, rediscovers solutions, and cannot learn.
The middle path separates who from how:
┌─────────────────────────────────────────────────────┐
│ PERSISTENT IDENTITY │
│ │
│ Name: Agent-7 │
│ Role: Security Reviewer │
│ Skills: [auth, crypto, input-validation] │
│ Track Record: 47 reviews, 3 critical finds │
│ Known Patterns: prefers OWASP checklist approach │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Session 1 │ │Session 2 │ │Session 3 │ ... │
│ │ │ │ │ │ │ │
│ │ Fresh │ │ Fresh │ │ Fresh │ │
│ │ context │ │ context │ │ context │ │
│ │ window │ │ window │ │ window │ │
│ │ │ │ │ │ │ │
│ │ Dispose │ │ Dispose │ │ Dispose │ │
│ │ after │ │ after │ │ after │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ EPHEMERAL EXECUTION │
└─────────────────────────────────────────────────────┘
The persistent layer holds identity, skills, track record, and accumulated expertise. The ephemeral layer holds the current session's context window—tools, files, and working memory for the immediate task. Sessions are disposable. Identity accumulates.
The Three Layers
| Layer | Persists Across Sessions | Contains | Size |
|---|---|---|---|
| Identity | Always | Name, role, skills, configuration | Small (~500 tokens) |
| History | Always (append-only) | Past session summaries, decisions, outcomes | Growing (compressed) |
| Session | Never | Current context, working files, tool state | Large (fills context window) |
Seancing: Querying the Past
When an agent needs context from a prior session, it performs what Gas Town calls seancing—querying past session records for decisions and rationale. This is analogous to institutional memory in organizations: no employee remembers every meeting, but meeting notes make past decisions retrievable.
Current Session:
Agent: "What authentication approach did we choose for the API?"
│
▼ [searches session history]
Session 12 Summary:
"Chose JWT with RS256 over session cookies.
Rationale: stateless scaling, mobile client support.
Decision maker: Architecture-Agent-3."
│
▼
Agent: [proceeds with JWT context loaded]
Seancing differs from loading full session history. It is targeted retrieval—searching for specific decisions, patterns, or outcomes rather than replaying everything.
What This Replaces
| Failing Model | Problem | Middle Path |
|---|---|---|
| Persistent sessions | Context bloat, degraded performance after 10+ interactions | Sessions are ephemeral; dispose after task |
| Stateless agents | Cannot learn, repeats mistakes, no institutional memory | Identity and history persist; sessions query them |
| Full history replay | Prohibitive token cost, irrelevant context dilution | Seancing retrieves targeted decisions only |
Identity Accumulation in Practice
The persistent identity layer grows through structured updates:
# agent-7-cv.yaml (persists across all sessions)
name: Agent-7
role: security-reviewer
sessions_completed: 47
specializations:
- authentication-flows
- input-validation
- cryptographic-implementations
notable_outcomes:
- session_12: "Identified JWT key rotation gap"
- session_31: "Caught SQL injection in parameterized query"
- session_45: "Flagged timing attack in comparison logic"
patterns_learned:
- "Check revocation lists when JWT validation present"
- "Verify constant-time comparison for all secret comparisons"Each session adds to the CV. New sessions restore context by reading the CV, not by replaying old sessions. The identity grows richer while each session starts clean.
Evidence
[2026-02-11]: Gas Town implements this model through "polecats"—agents with persistent CVs that accumulate track records across sessions. New sessions restore context via hook-based initialization that loads identity and queries relevant history. The CV structure enables capability-based routing: the orchestrator assigns security reviews to agents with demonstrated security expertise, not to the next available agent.
Implications for Practitioners
- Design identity schemas early. What persists across sessions determines what agents can learn. Invest in the structure of the persistent layer.
- Make sessions disposable by default. The temptation to preserve session state "in case it's useful" leads to bloat. Default to disposal; query when needed.
- Build seancing into workflows. Agents should actively query past decisions when entering domains they've worked in before. This is not automatic—it requires explicit tooling.
- Use CVs for routing. Agent identity enables intelligent task assignment. An orchestrator that knows Agent-7 found three critical security issues routes security-sensitive work to Agent-7.
Model 4: Work as Ledger
Every agent action is a timestamped entry in a permanent record.
The Core Idea
Most agent systems treat execution as ephemeral—work happens, results appear, the process vanishes. The ledger model treats execution as accounting: every action, decision, and outcome is a timestamped, attributed entry in a permanent record.
Fire-and-Forget Model: Ledger Model:
Task → Agent → Result Task → Agent → Result
│
▼
┌──────────────────────┐
│ LEDGER │
│ │
│ 14:23 Agent-7 READ │
│ auth.py (247 lines)│
│ │
│ 14:24 Agent-7 FOUND │
│ JWT validation gap │
│ severity: critical │
│ │
│ 14:25 Agent-7 WROTE │
│ fix.patch (12 lines)│
│ model: opus-4 │
│ tokens: 3,847 │
│ │
│ 14:26 Agent-7 DONE │
│ result: success │
└──────────────────────┘
What the Ledger Enables
Attribution is not a compliance afterthought—it is system intelligence. The ledger enables capabilities that fire-and-forget systems cannot support:
| Capability | How the Ledger Enables It |
|---|---|
| Debugging | Trace failures to specific agent actions and decisions |
| Capability routing | Route tasks to agents with proven track records |
| Performance management | Identify agents that consistently produce quality vs. rework |
| Model comparison | Compare output quality across models (opus vs. sonnet) for the same task type |
| Cost attribution | Know which agents and tasks consume the most tokens |
| Regression detection | Detect when agent performance degrades over time |
The Batch-Closure Heresy
One anti-pattern threatens ledger integrity: closing entries retroactively. When an agent marks work as complete after the fact—or worse, when a supervisor closes entries on behalf of agents—the ledger loses its causal ordering.
Correct Ledger: Corrupted Ledger:
14:23 START task-42 14:23 START task-42
14:25 READ auth.py 14:50 [BATCH CLOSE]
14:27 FOUND vulnerability - READ auth.py ← no timestamp
14:28 WROTE fix.patch - FOUND vuln ← retroactive
14:29 COMPLETE task-42 - WROTE fix ← rewritten
- COMPLETE task-42 ← bundled
Causal order preserved. Causal order destroyed.
Debugging: straightforward. Debugging: impossible.
Batch closure is tempting because it reduces logging overhead. But it destroys the property that makes ledgers valuable: the ability to reconstruct exactly what happened, in what order, and why.
Ledger vs. Logging
The ledger model is not simply "add logging." Logs are diagnostic artifacts—searched when something breaks. The ledger is an operational artifact—queried continuously for routing, attribution, and improvement.
| Property | Logging | Ledger |
|---|---|---|
| Purpose | Diagnosis after failure | Continuous system intelligence |
| Queried | When something breaks | Every task assignment, every review |
| Retention | Rotated, compressed, archived | Permanent, append-only |
| Structure | Free-form text | Structured entries with agent, action, outcome |
| Attribution | Optional | Required for every entry |
| Consumers | Humans debugging | Orchestrators, agents, and humans |
When This Model Applies
Good fit:
- Multi-agent systems where multiple agents contribute to outcomes
- Systems requiring auditability or compliance
- Long-running projects where agent performance must improve over time
- Environments comparing multiple models or agent configurations
Poor fit:
- Single-agent, single-session interactions
- Prototyping where overhead must be minimal
- Tasks where the process is less important than the result
- Ephemeral scripts and one-off automation
Evidence
[2026-02-11]: Gas Town implements the ledger model through its "Bead" system—a permanent work record where every agent action is a timestamped, attributed entry. The Bead system enables capability routing (assigning tasks based on past performance), regression detection (flagging agents whose quality has declined), and cost attribution (tracking token spend per agent and task type). The "Batch-Closure Heresy"—closing entries retroactively—is explicitly forbidden because it corrupts the causal ordering that makes the ledger useful.
Implications for Practitioners
- Design attribution into the system from day one. Retrofitting attribution onto an existing system is orders of magnitude harder than building it in.
- Make the ledger queryable, not just writable. A ledger that can only be read by humans is a log. A ledger that orchestrators query for routing decisions is system intelligence.
- Enforce causal ordering. Every entry must have a timestamp and an agent attribution. Batch closures destroy the causal structure.
- Use the ledger for routing. Agent assignment should consider past performance on similar tasks, not just availability.
Model 5: Factory Floor vs. Workshop
The transition from 3 agents to 30 is not gradual—it is a phase change.
The Core Idea
Two distinct models exist for organizing agentic work:
WORKSHOP (1-3 agents): FACTORY FLOOR (10-30 agents):
┌─────────────────────┐ ┌──────────────────────────────────┐
│ │ │ ┌──────────┐ ┌──────────┐ │
│ Human ◄──► Agent │ │ │Supervisor│ │ Health │ │
│ ◄──► Agent │ │ │ Agent │ │ Monitor │ │
│ │ │ └─────┬────┘ └──────────┘ │
│ Direct management │ │ │ │
│ Conversational │ │ ┌─────┼──────────────────┐ │
│ Each agent visible │ │ │ │ Merge Queue │ │
│ │ │ │ ┌──┴──┐ ┌─────┐ ┌───┐│ │
└─────────────────────┘ │ │ │A│B│C│ │D│E│F│ │...││ │
│ │ └─────┘ └─────┘ └───┘│ │
│ └────────────────────────┘ │
│ │
│ Infrastructure required: │
│ - Supervisors │
│ - Merge queues │
│ - Health monitoring │
│ - Conflict resolution │
│ - Work attribution │
└──────────────────────────────────┘
The workshop is intimate. A human directly manages each agent, sees all output, and handles coordination through conversation. This works well for 1-3 agents—the cognitive load is manageable, and the overhead of infrastructure is not justified.
The factory floor is industrial. No human can directly manage 20+ agents. Infrastructure replaces direct management: supervisors route work, merge queues prevent conflicts, health monitors detect failures, and attribution tracks accountability.
The Phase Change
The transition between these models is not gradual. It is a phase change—a discontinuity where the old model stops working and new infrastructure becomes necessary.
Agent Count: 1 3 5 8 10 15 20 30
│ │ │ │ │ │ │ │
Workshop: ████████████████░░░░░░░░░░░░░░░░░░
▲
│ Phase transition
│ (chaos zone)
▼
Factory: ░░░░░░░░░░░░░░████████████████████
Infrastructure needed at each level:
1-3 agents: Nothing. Direct management works.
4-7 agents: Merge strategy. File conflicts appear.
8-12 agents: Supervisors. Humans cannot track all agents.
13-20 agents: Health monitoring. Silent failures occur.
20+ agents: Full infrastructure. Attribution, queues,
conflict resolution, automated routing.
Scale Infrastructure Table
| Agent Count | Infrastructure Required | Why |
|---|---|---|
| 1-3 | None | Human directly manages each agent |
| 4-7 | Merge strategy | Multiple agents editing overlapping files |
| 8-12 | Supervisors | Human cannot track status of all agents simultaneously |
| 13-20 | Health monitoring | Silent agent failures go undetected without automated checks |
| 20-30 | Full orchestration | Attribution, queuing, conflict resolution, automated routing |
| 30+ | Hierarchical management | Single orchestrator cannot manage all agents; sub-orchestrators needed |
Two Failure Modes
Premature Factory Infrastructure: Building supervisor agents, merge queues, and health monitors for a 2-agent system. The overhead exceeds the value. Simple direct management works better. The infrastructure becomes complexity without benefit—bureaucracy for a team of two.
Delayed Factory Infrastructure: Running 15 agents with workshop-style direct management. Merge conflicts multiply, silent failures go undetected, and the human becomes the bottleneck—unable to review all output, missing critical issues, and drowning in coordination overhead. The system appears to work but produces lower quality than a well-managed 5-agent workshop.
Steve Yegge's Eight Levels
Steve Yegge's framework for AI-assisted development describes a spectrum from Level 1 (AI as autocomplete) to Level 8 (AI-first development). Most practitioners operate at Levels 1-4 (workshop). Factory-floor systems like Gas Town target Levels 7-8, where the development model fundamentally changes.
| Level Range | Mode | Agent Relationship |
|---|---|---|
| 1-2 | Autocomplete, chat | AI assists human coding |
| 3-4 | Pair programming, delegation | AI handles defined subtasks |
| 5-6 | Supervised autonomy | AI works independently with checkpoints |
| 7-8 | Factory floor | Human designs, AI swarm implements |
The levels are not a maturity model where everyone should reach Level 8. They describe different modes suitable for different contexts. Most tasks are well-served at Levels 3-4. Factory-floor orchestration is necessary only when the task scope genuinely demands 10+ concurrent agents.
What This Replaces
| Workshop Assumption | Factory Floor Reality |
|---|---|
| "Add more agents to go faster" | More agents without infrastructure causes chaos |
| "I can manage 10 agents like I manage 2" | Cognitive load grows superlinearly with agent count |
| "Infrastructure is premature optimization" | Infrastructure becomes necessary at ~8 agents |
| "Scaling is linear" | Scaling has phase transitions requiring new architecture |
Evidence
[2026-02-11]: Gas Town's entire architecture is factory-floor infrastructure—supervisors, health monitoring, merge queues, work attribution, and automated routing. The system targets Yegge's Level 7-8 operations, where 20-30 agents execute in parallel. Teams that attempt Gas Town's scale without its infrastructure (or equivalent) consistently encounter the chaos zone: merge conflicts, silent failures, and human bottlenecks. The architecture itself is evidence that the phase transition is real and requires deliberate infrastructure investment.
When to Transition
The decision to move from workshop to factory is not about ambition—it is about observable symptoms:
Transition Signals (Workshop → Factory):
Merge conflicts appearing?
├── No → Stay in workshop
└── Yes
└── Agent failures going undetected?
├── No → Add merge strategy only
└── Yes
└── Human overwhelmed by review volume?
├── No → Add health monitoring
└── Yes → Build factory infrastructure
Each symptom maps to specific infrastructure. Build only what the symptoms demand—premature infrastructure is its own failure mode.
Model 6: Interface as Bottleneck
When AI capability exceeds accessible capability, the interface is the constraint.
The Core Idea
The five preceding models address bottlenecks in multi-agent execution: design quality, agent coordination, state management, accountability, and infrastructure scale. Each operates on the developer/orchestrator side of the system. A complementary bottleneck operates at the human-facing layer: even when AI capability is ample, users can only access what the interface makes thinkable.
Ethan Mollick frames this as "capability overhang" — the gap between what AI can do and what users actually request. The cause is not model capability; it is interface design:
"We built one of the most powerful technologies in recent history and then made people access it by typing into a chat window."
The chat interface shapes cognition. A chat window makes users think in terms of "questions and responses." A task-dispatch interface makes users think in terms of "work delegation." The same model, different interaction paradigm, different effective capability.
Developer-Side Bottleneck (Models 1-5): User-Side Bottleneck (Model 6):
AI Capability AI Capability
│ │
▼ ▼
Design ┌──────────────┐
Specification │ INTERFACE │ ← accessible capability ceiling
Decomposition │ BOTTLENECK │
│ └──────────────┘
▼ │
Implementation ▼
│ What users
▼ think to ask
Output
(same model, different ceiling)
These two bottlenecks coexist and operate at different system levels. Addressing the developer-side bottleneck (better specs, faster implementation) does not resolve the user-side bottleneck. A perfectly specified multi-agent system that users cannot effectively interact with still underperforms.
Why the Interface Shapes Cognition
Three mechanisms explain why the interface determines the effective capability ceiling:
-
Affordance gaps. Users cannot request capabilities they cannot perceive. Chat interfaces present a text field, which signals "ask questions." Caetano et al. (2025) document this directly: "Conversational user interfaces often lack clear signifiers for AI affordances." The interface does not reveal what AI can do; users discover capabilities through friction and accident.
-
Interaction paradigm lock-in. Mollick's Claude Dispatch demonstration illustrates the alternative: from his phone, he delegated Claude to verify a graph in a PowerPoint, find updated data online, download a PDF, extract the graph, and update the slide — a task framed as delegation, not question-answering. The dispatch interface made this task sequence thinkable; a chat window would not have.
-
Cognitive overhead. Research with financial professionals showed productivity gains from AI assistance were partially offset by interface-induced cognitive burden: walls of text, tangential suggestions, sprawling conversation threads. The interface imposes a processing cost that reduces net benefit.
Evidence
[2026-03-31]: Mollick's "Claude Dispatch and the Power of Interfaces" (One Useful Thing) demonstrates the interface bottleneck via Claude Dispatch: the same model accessed through a task-dispatch interface enables qualitatively different work than through chat. Primary practitioner evidence for the capability-overhang framing.
[2026-03-19]: Anthropic's Claude Cowork product page: "Unlike Chat, Cowork lets Claude complete work on its own. Describe the outcome and cadence, and it takes action." Anthropic's own positioning confirms the paradigm distinction between conversational interaction and asynchronous delegation.
[January 2025]: Caetano et al., "Agentic Workflows for Conversational Human-AI Interaction Design" (arxiv:2501.18002): research-through-design study with 10 participants documenting that users lack clarity about AI capabilities and that conversational interfaces fail to surface affordances. Provides academic grounding for the capability discovery problem.
[February 2026]: Andru & Saksena, "Interface Framework for Human-AI Collaboration" (arxiv:2602.22343): three-factor framework (workflow complexity, AI autonomy, AI reasoning) mapping to interface pattern selection. Describes interfaces as "necessary affordances bridging legacy SaaS tools with autonomous systems" and identifies "progressive user control" as a design principle for graduated human oversight as AI autonomy increases.
The Three-Layer Bottleneck Stack
All three layers have distinct bottlenecks. Resolving one does not resolve the others:
| Layer | Bottleneck | What It Controls |
|---|---|---|
| Developer/Specification | Design quality, decomposition precision | What the agent system can do |
| Interface Design | Affordance clarity, interaction paradigm | What users think to request |
| Model Capability | Reasoning quality, context handling | Execution quality for a given request |
Most capability-overhang conversations conflate the interface layer with the model layer ("the AI isn't good enough"). Mollick's argument is that the interface layer is the binding constraint before model capability for most users.
Implications for Practitioners
- Design the affordance surface explicitly. What can users see the system doing? What actions are surfaced vs. buried? A system that can delegate email processing but presents only a chat window effectively cannot delegate email processing.
- Match interaction paradigm to task type. Synchronous chat is appropriate for exploratory tasks requiring back-and-forth. Asynchronous delegation (Cowork/Dispatch model) is appropriate for well-scoped tasks that do not need mid-task steering. The paradigm shapes what users request.
- Treat interface design as a first-class engineering concern. Interface decisions are not UX polish applied after capability is built — they determine the accessible capability ceiling. The most capable agent system is limited by the interface through which it is reached.
- Consider progressive capability revelation. Andru & Saksena's "progressive user control" principle applies to capability discovery: surfaces that reveal AI affordances progressively (starting simple, growing richer as user confidence grows) outperform both overwhelming full-feature disclosure and impoverished minimal interfaces.
When This Model Applies
Most relevant:
- User-facing agentic products where non-expert users interact with AI systems
- Internal tooling where adoption is below expectations despite model capability
- Situations where "the AI isn't good enough" feedback may actually be "the interface doesn't surface what the AI can do"
Less relevant:
- Developer-to-developer tooling where users have high model familiarity
- CLI-first tools where affordance discovery is expected to require documentation
- Pure orchestration systems where no human interacts with the interface at runtime
The Models in Combination
These five models form an interlocking system. Each addresses a different aspect of multi-agent work at scale:
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ INTERFACE AS BOTTLENECK │
│ (what users can access) │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ DESIGN AS BOTTLENECK │ │
│ │ (what to invest in) │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────┐ ┌────────────────────────┐ │ │
│ │ │ AGENTS AS │ │ PERSISTENT IDENTITY, │ │ │
│ │ │ PISTONS │ │ EPHEMERAL EXECUTION │ │ │
│ │ │ │ │ │ │ │
│ │ │ (how agents │ │ (how agents persist │ │ │
│ │ │ execute) │ │ across sessions) │ │ │
│ │ └──────┬───────┘ └───────────┬────────────┘ │ │
│ │ │ │ │ │
│ │ └──────────┬──────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────────┐ │ │
│ │ │ WORK AS LEDGER │ │ │
│ │ │ │ │ │
│ │ │ (how work is │ │ │
│ │ │ tracked) │ │ │
│ │ └────────┬─────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────────┐ │ │
│ │ │ FACTORY FLOOR │ │ │
│ │ │ vs. WORKSHOP │ │ │
│ │ │ │ │ │
│ │ │ (what scale │ │ │
│ │ │ demands) │ │ │
│ │ └──────────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Design as Bottleneck identifies where to invest: upstream, in decomposition and specification.
Agents as Pistons defines the execution contract: fire-and-forget, no conversation overhead.
Persistent Identity, Ephemeral Execution solves the state management problem: agents learn without bloating.
Work as Ledger provides accountability: every action is recorded, attributed, and queryable.
Factory Floor vs. Workshop determines what infrastructure these models require at each scale level.
Each model is useful independently. Together, they describe a coherent philosophy for building multi-agent systems that scale.
Interface as Bottleneck identifies the user-facing constraint: before agents receive any task, the interface determines what tasks users think to delegate. Capability overhang — the gap between what AI can do and what users request — is closed by interface design, not model capability.
Anti-Patterns
Conversational Orchestration at Scale
The mistake: Using multi-turn message exchanges to assign work to 15+ agents. "Agent-3, are you available? Great, here's task 7. Agent-3, how's it going? Agent-3, are you done yet?"
Why it fails: Orchestration overhead grows quadratically with agent count. At 20 agents, the orchestrator spends more tokens coordinating than agents spend working.
The fix: Piston model. Hook fires, agent runs, completion signal returns. Zero conversational overhead.
Stateless Swarms
The mistake: Spawning fresh, identity-free agents for every task. No history, no expertise accumulation, no capability routing.
Why it fails: The system cannot improve. Agent-7's excellent security review in Session 12 is invisible to Session 13's task assignment. Every task is assigned to a random agent regardless of fit.
The fix: Persistent identity with ephemeral execution. CVs accumulate track records. Orchestrators query CVs for routing.
Fire-and-Forget Without Attribution
The mistake: Agents produce outputs but the system does not record who did what, when, or how. Results appear but the process is opaque.
Why it fails: Debugging requires replaying entire sessions. Performance improvement is impossible because there is no data. Cost optimization is guesswork because token spend is not attributed.
The fix: Ledger model. Every action is a timestamped, attributed entry. The ledger is queryable by orchestrators, not just searchable by humans.
Workshop Mentality at Factory Scale
The mistake: A human attempting to directly manage 20 agents the same way they manage 2. Reading every output, making every routing decision, manually resolving every conflict.
Why it fails: The human becomes the bottleneck. Agent throughput is limited by human review speed. The entire point of scaling—parallel throughput—is negated by serial human review.
The fix: Factory floor infrastructure. Supervisors, health monitoring, merge queues, automated routing. The human designs and reviews; infrastructure manages execution.
Vibe Designed
The mistake: Shipping a system whose architecture was never deliberately considered — metaphors chosen implicitly, primitives inherited rather than selected, interaction model emerging by accident rather than design.
Why it fails: A vibe-designed system may function and even scale; its design decisions will be paid for downstream when the implicit metaphors collide with requirements they were not built to handle. The failure is upstream of specification: no specification effort can compensate for primitives that were never examined. Maggie Appleton (January 2026) uses this framing to critique Gas Town directly — Yegge "absolutely did not design the shape of this system ahead of time, thoughtfully considering which metaphors and primitives would make this effective."
The fix: Treat metaphor and primitive selection as explicit design decisions before any implementation begins. Name the interaction model. State which abstractions are load-bearing. The upstream cost of deliberation is low; the downstream cost of renegotiating embedded metaphors is high.
Open Questions
-
Where is the next bottleneck after design? If design automation improves (agents generating specifications from high-level intent), what becomes the new constraint? Requirements? Strategy? Taste?
-
Can the piston model support creative work? Creative tasks may require the conversational back-and-forth that the piston model eliminates. Is there a hybrid model that preserves execution efficiency for implementation while allowing conversation for creative exploration?
-
What is the optimal CV structure? Too detailed and the CV consumes excessive context. Too sparse and routing decisions lack signal. What information density maximizes routing quality per token?
-
How large can ledgers grow before they become unwieldy? Append-only ledgers grow without bound. What compression, summarization, or archival strategies maintain queryability without unbounded growth?
-
Is the phase transition predictable? The workshop-to-factory transition appears to occur around 8-12 agents, but this number likely varies by domain and task complexity. Can the transition point be predicted from task characteristics?
-
What happens above 100 agents? Gas Town operates at 20-30 agents. Model-native swarm systems (e.g., Kimi K2.5) demonstrate 100+ concurrent subagents. Does factory-floor infrastructure scale to this level, or does a third organizational model emerge?
-
How does the ledger interact with privacy constraints? In regulated environments, permanent attribution records may conflict with data retention policies. How does the ledger model adapt to privacy requirements?
-
What happens when AI generates interfaces on-demand? Mollick argues the near-term trajectory involves AI generating the right interface for the moment — agent on desktop, chart in conversation, custom app — rather than users adapting to fixed interfaces. If this is directionally correct, the bottleneck shifts from "what interface exists" to "how well AI generates the appropriate interface for a given task and user." Does AI-generated interface quality become the next constraint in the stack?
Connections
-
Execution Topologies: The five topologies describe shapes of agent work; these models describe principles for operating agent systems at scale. Factory floor infrastructure enables wider and deeper topologies. The piston model reduces friction in the measurement framework.
-
Specs as Source Code: Design as Bottleneck is the logical consequence of treating specs as source code. If specifications are the primary programming surface, and implementation is automated, then specification quality is the constraint.
-
Pit of Success: The piston model is a pit-of-success design for agent execution. The easiest path (hook fires, agent runs) is also the correct path. No negotiation means no negotiation failures.
-
Self-Improving Experts: Persistent Identity, Ephemeral Execution is the mental model behind self-improving experts. The pattern (expertise files, session learning) implements the model (persistent identity, ephemeral sessions).
-
Orchestrator Pattern: Factory floor infrastructure is what orchestrator patterns become at scale. The orchestrator pattern describes coordination; Factory Floor vs. Workshop describes when coordination requires dedicated infrastructure.
-
Expert Swarm Pattern: Expert swarms operate at factory-floor scale. The swarm pattern implements the piston model (agents as execution units) and the ledger model (work attribution across swarm members).
-
Workflow Coordination: Operational implementation of the infrastructure that factory-floor scale demands—merge strategies, health monitoring, and conflict resolution. See The Human Role in Agent-Managed Workflows for the three-part human role taxonomy that follows from the design-as-bottleneck constraint.
-
Context Fundamentals: Ephemeral execution is a context management strategy. Disposing sessions and loading identity via CV keeps context windows fresh and relevant.
-
Operating Agent Swarms: Operational practices for when the factory floor is running—cost management, incident response, and scale transitions that emerge once design bottlenecks are addressed.
-
Claude Code: Claude Cowork/Dispatch (documented there) is the production instantiation of the interface bottleneck model — Anthropic's own product acknowledges the paradigm gap between chat and asynchronous delegation. See the Cowork entry in the Tips & Tricks section.
-
Human-in-the-Loop: Human-in-the-Loop patterns address developer-designed approval gates; the interface bottleneck model addresses user-facing affordance design. These are complementary concerns at different system layers.
Sources
- Eliyahu M. Goldratt, The Goal (1984) — Theory of Constraints: every system has exactly one bottleneck limiting throughput
- Gas Town multi-agent orchestration platform — production evidence for piston model, persistent identity, ledger system, and factory-floor infrastructure
- Steve Yegge, "Eight Levels of AI-Assisted Development" — framework for classifying human-AI development modes
- Rico Mariani, .NET Framework design — original "Pit of Success" concept, applied here to agent execution contracts
- Maggie Appleton, "Gas Town's Agent Patterns, Design Bottlenecks, and Vibecoding at Scale" (January 2026) — six-factor code-proximity framework; "vibe designed" framing applied to Gas Town. https://maggieappleton.com/gastown/
- Mollick, E. "The Shape of the Thing." One Useful Thing, ~2026-03-12. https://www.oneusefulthing.org/p/the-shape-of-the-thing — independent organizational-research path to the design-as-bottleneck conclusion; RSI as destabilizing factor
- Mollick, E. "Centaurs and Cyborgs on the Jagged Frontier." One Useful Thing, 2023-09-16. https://www.oneusefulthing.org/p/centaurs-and-cyborgs-on-the-jagged — centaur/cyborg taxonomy; BCG 758-consultant study (cited secondarily via Mollick)
- Ethan Mollick, "Claude Dispatch and the Power of Interfaces" (One Useful Thing, ~2026-03-31) — practitioner evidence for capability-overhang framing and interface paradigm analysis: https://www.oneusefulthing.org/p/claude-dispatch-and-the-power-of
- Caetano et al., "Agentic Workflows for Conversational Human-AI Interaction Design" (arxiv:2501.18002, January 2025) — research grounding for affordance gap in conversational AI interfaces: https://arxiv.org/abs/2501.18002
- Andru & Saksena, "Interface Framework for Human-AI Collaboration" (arxiv:2602.22343, February 2026) — three-factor framework and progressive user control principle: https://arxiv.org/abs/2602.22343
- Claude Cowork product page (accessed 2026-04-11) — Anthropic's own positioning on chat vs. delegation paradigm distinction: https://claude.com/product/cowork