What actually makes a coding agent work: the six parts of the harness

Engineering · April 4, 2026 · 1 month ago · source (magazine.sebastianraschka.com)

Sebastian Raschka takes apart what sits between the model and a working coding agent. The harness wraps the model in an observation, inspection, choice, action loop and manages six things. Live repo context pulls Git status, branch, and layout up front so "fix the tests" has meaning. Prompt shape keeps a stable cached prefix and only updates the recent history, which is what makes long sessions affordable. Tool access replaces prose suggestions with predefined tools the harness validates, permission-checks, runs, and feeds back.

The other three handle the long tail. Context reduction clips long outputs, deduplicates old file reads, and compresses old transcript more aggressively than recent events. Structured session memory keeps two layers, a durable full transcript and a distilled working memory, so a task can resume. Bounded subagents inherit enough context to help but run read-only or with recursion limits so they do not spawn uncontrolled. The throughline is that the harness, not raw chat, is what makes the same model competent. Read the full piece on Ahead of AI.

Why it matters

If you are building or evaluating an agent, this is the checklist. When an agent fails, it is usually one of these six, not the model, so naming them tells you where to look before you reach for a bigger model.

Agents