Harnesses | Agentic Engineering

The harness is everything around the model that makes it an agent.

Agent = Model + Harness

The four preceding foundational pillars — Prompt, Model, Context, Tool Use — answer what the agent says, what it can reason about, what it knows, and what it can do. The harness answers the fifth question: what system orchestrates and constrains the agent's execution? Without a harness, there is no agent — only a model being prompted. The harness is the connective tissue that makes multi-step, multi-tool, multi-session work possible.

[2026-04-12]: Practitioner consensus around this formula crystallized in early 2026. Martin Fowler, Ethan Mollick, Sebastian Raschka, Philipp Schmid, and Mitchell Hashimoto independently converged on the harness as the primary differentiator between model capability and product performance — not through coordination, but through simultaneous arrival at the same empirical observation.

Chapter Overview

This chapter builds from definition through practical design guidance:

1. What Is a Harness?

The foundational definition, formula, and conceptual underpinnings. Establishes the horse harness metaphor (raw capability → useful work), presents definitions from five authoritative practitioners, distinguishes harness from scaffold, and traces the historical evolution from prompt engineering through context engineering to harness engineering.

Key concepts:

Agent = Model + Harness formula
Scaffold (pre-runtime) vs. harness (runtime) distinction
The three eras: prompt engineering, context engineering, harness engineering
Why harness is a foundational pillar, not a peripheral concern

2. The Harness Stack

Raschka's six-component taxonomy of the execution environment. Each component has distinct responsibilities, failure modes, and optimization targets. Understanding the stack enables practitioners to audit agent failures at the right layer rather than defaulting to model blame.

Key concepts:

Workspace context (stable facts, repo map)
Prompt shape and cache reuse (stable/dynamic split)
Tool access (bounded inventories, permission filtering)
Context management (clipping, deduplication, compression)
Session memory (working memory + full transcript duality)
Subagent delegation (bounded spawning, context scoping)

3. Harness Categories

The taxonomy of harness types from raw coding agents to full workspace managers. Includes Mollick's three-axis Model/App/Harness stack, capability tier comparison, and the distinction between frameworks (build-time) and harnesses (runtime with defaults). Provides a decision table linking problem characteristics to harness category.

Key concepts:

Model/App/Harness as independent axes
Harness capability tiers (full agentic / web-based / constrained)
Framework vs. runtime vs. harness vocabulary
Workflow harnesses, workspace managers, persistent personal runtimes
Category selection decision table

4. Harness as Control System

Fowler's guides-and-sensors decomposition treats the harness as an active control system, not passive scaffolding. Guides intervene before agent actions (feedforward); sensors observe results and steer subsequent behavior (feedback). Each mechanism can be computational (deterministic) or inferential (model-based).

Key concepts:

Guides (feedforward control) — computational and inferential
Sensors (feedback control) — computational and inferential
The cost tradeoff: computational vs. inferential mechanisms
Agent Psychometrics formula: scaffold quality is additively independent from LLM capability

5. Harness Engineering

Hashimoto's methodology for systematic harness improvement — coined February 5, 2026. The core discipline: when an agent makes a mistake, engineer the surrounding system so that mistake cannot recur. Covers the six-step engineering loop, failure classification taxonomy, and Schmid's trajectory-capture competitive advantage thesis.

Key concepts:

Harness engineering vs. prompt patching
The six-step improvement loop
Failure classification (model / context / prompt / harness / tool)
Trajectory capture as competitive infrastructure
Verification-driven development as harness engineering in practice

6. Security, Permissions, and Trust

The harness is the primary security boundary in an agentic system. The model does not enforce permissions — the harness does. Covers permission models (scope/operation/session dimensions), sandbox architecture, token-level vs. session-level access control, trust hierarchies in multi-agent systems, and observability requirements.

Key concepts:

Harness enforcement vs. model enforcement
Sandbox dimensions (filesystem, network, process, resource)
Token-level vs. session-level access control
Principle of least privilege in multi-agent trust hierarchies
Observability surfaces: tool call logs, permission logs, session transcripts

7. Designing for Your Context

Translates the conceptual framework into actionable decisions. Presents the four design questions (time horizon, agent count, security requirements, trajectory volume), a decision tree for harness selection, the incremental build sequence, and the compound advantage that trajectory capture creates over time.

Key concepts:

Four design questions for harness selection
Decision tree: exploratory → workflow → workspace manager → custom
Incremental build sequence starting from existing harnesses
Compound advantage through trajectory capture

Core Questions

This chapter explores:

Definition: What distinguishes a harness from a prompt, a tool, or a framework?
Stack: What components comprise a harness, and what does each one do?
Categories: Which harness type fits which problem?
Control: How does the harness steer agent behavior before and after actions?
Engineering: How does the harness improve through observed failures?
Security: How does the harness enforce permissions and trust boundaries?
Design: How should practitioners make harness design decisions?

The Short Version

Default to an existing full agentic harness for most work. Building a harness from scratch is a significant investment. Claude Code, Codex, and equivalent tools ship with Anthropic's harness defaults already set. Start there.

When an agent fails repeatedly, the first audit target is the harness, not the model. Raschka's diagnostic: "much of apparent model quality is really context quality." Context quality is a harness concern.

Instrument the harness for trajectory capture from the beginning — this is infrastructure, not a nice-to-have. Every session run through a well-instrumented harness produces training data, evaluation data, and edge-case documentation. The compounding effect accumulates over months.

Connections

To Foundations: The harness is the fifth pillar, joining Prompt, Model, Context, and Tool Use. The Twelve Leverage Points chapter identifies execution-layer levers that map directly to harness components.
To Prompt: The harness manages prompt shape — the stable/dynamic split that enables cache reuse. Prompt engineering operates within constraints the harness establishes.
To Model: Harness and model capability are additively independent (Agent Psychometrics formula). Harness investment yields gains regardless of model selection.
To Context: Context management is a harness component. The context chapter covers strategies in depth; this chapter owns the architectural framing.
To Tool Use: Tool access is a harness concern — the harness defines what the agent can do. Tool design principles operate within the access layer the harness enforces.
To Patterns: The Orchestrator Pattern, Autonomous Loops, and ReAct patterns all describe execution flows that harnesses implement. The harness is the runtime that makes patterns operational.
To Practices: Debugging, cost management, and production operations are all harness-level concerns. The practices chapter provides operational depth; this chapter provides the conceptual foundation.
To Practitioner Toolkit: Specific harness implementations — Claude Code, Google ADK, LangGraph — are evaluated in the Toolkit chapter. This chapter provides the framework for evaluating them.