Summary of "How does Claude Code *actually* work?"
High-level summary
Central idea: a “harness” is the runtime, tools, and environment that let an LLM go beyond text generation (read/edit files, run shell commands, web search, etc.). The model still only produces text; the harness executes tool calls, manages context, permissions, and the back-and-forth with the model — and that dramatically changes code-assistant quality.
Mechanism (tool-calling flow)
- The model responds with a structured tool call (special syntax).
- The harness executes the tool (e.g., bash, read file, edit file), optionally asking the user for permission on destructive actions.
- The harness appends the tool output to the chat history/context and re-queries the model so it continues from the new state.
Each tool call effectively pauses the model’s “brain” and restarts it with the appended output; that loop is the core of how code assistants operate.
Key technological concepts and best practices
-
Tools vs. models
- The harness supplies tools (read/list/edit/bash/etc.) and tells the model how to call them via the system prompt.
- The model learns to request tools but cannot perform real system actions without the harness.
-
Minimal core toolset
- For many simple harnesses you only need three basic tools: read file, list files, edit file.
- Optionally add bash, web search, LLM-to-LLM helpers.
-
System prompt and tool descriptions
- Define tools and usage rules in the system prompt (or via the API’s tools field). This is how the model learns which tools exist and how to invoke them.
- Tweaking wording, deprecations, or guidance in tool descriptions strongly steers model behavior.
-
Context management and bootstrapping
- The model only knows what’s in the chat history (or pre-seeded “agent.md / CloudMD”). If context is preloaded, the model may skip exploratory calls.
- Tool calls are cheap; harnesses usually find only needed pieces — you rarely need to copy entire codebases into context.
- Large context windows or stuffing entire repos often degrade accuracy; prefer targeted tool-driven context building.
-
Security & UX
- Harnesses should implement permission checks for destructive operations.
- Different UIs/harnesses handle prompting and permissions differently.
-
Implementation reality (practical pattern)
- A simple, functional harness can be small (tens to a few hundred lines).
- Typical components shown in the demo:
- read_file(path) → returns file content (e.g., JSON)
- list_files(dir) → returns filenames/types
- edit_file(path, old, new) → replace or write
- a tool registry and a system prompt instructing the model to emit lines like: tool: name
- a loop: send messages to model, parse tool-invocations, run tools, append outputs, re-call the model
Product, feature, and analysis highlights
-
Benchmark finding
- Moving the same models into a harness produced meaningful improvements (example: Opus from 77% to 93% in a benchmark by Matt Mayer).
-
Cursor vs. provider-native harnesses
- Cursor invests heavily in prompt/tool tuning, vector indexing/search, and per-model adjustments. That micro-tuning is why hosted UIs like Cursor often outperform raw provider UIs.
-
Model differences
- Different models (Claude, Opus, Gemini, GPT variants) behave differently even under the same harness. Harness authors must tune tool descriptions and prompts per model.
-
The harness controls the model’s view
- Returning fake tool outputs or modifying tool docs will change the model’s behavior — the model trusts the harness-provided data.
-
T3 Code vs. harness
- T3 Code is a UI that can surface multiple harnesses (e.g., Claude Code, Codeex). T3 Code itself doesn’t implement tools — it wraps/relies on installed harnesses on the client machine.
-
Best practices (summary)
- Provide clear tool docs in the system prompt and iterate/tune them.
- Prefer targeted tool-driven context building over stuffing whole repos.
- Implement permission checks for dangerous edits.
- Use search/indexing to help the model find relevant files rather than dumping large contexts.
Tutorials, guides, and references mentioned
-
Two recommended write-ups for building harnesses:
- AMP team article (April, previous year) — shows a simple harness mental model and implementation.
- Mah’s article “The Emperor Has No Clothes” — demonstrates a straightforward, small Python harness.
-
Video’s hands-on demo
- Builds a minimal Python harness (tool registry, system prompt, loop).
- Demonstrates:
- running with read/list/edit tools,
- switching to a bash-only tool and shrinking code,
- how changing tool descriptions affects model choices,
- how CloudMD/agent.md can be used to pre-seed context.
-
Historical/other approaches referenced
- repo-mix (squashing repos into context — discouraged now)
- vector indexing/search (used by Cursor)
Security, limitations, and caveats
- Models only generate text; the harness is responsible for actual side effects — harness security matters a lot.
- Models can be misled by incorrect tool outputs or misleading tool docs; harnesses should validate outputs where possible.
- Provider restrictions vary: some providers restrict using your paid account with third-party tools (Anthropic, Google), while others (OpenAI) are more permissive — this affects which harnesses you can use with your subscription.
References, people, and sources cited
-
People and authors
- Matt Mayer — benchmark showing harness-driven improvements
- Mah — author of “The Emperor Has No Clothes”
- AMP team — guide/article on building harnesses
- Video speaker / YouTube channel host — main narrator and demonstrator
-
Products/platforms referenced
- Claude Code (Anthropic), Cursor, Codeex, T3 Code (UI)
- Open Router, OpenAI, Gemini, Opus, Sonnet
- repo-mix (historic)
- Macroscope (sponsor — AI code reviewer/insights tool)
Notes about the demo code pattern
The video includes a hands-on Python demo implementing the pattern described above (tool stubs, system prompt example, parsing loop). A compact snippet or checklist can be extracted from that demo for building a small harness (read/list/edit tools, tool registry, loop that parses tool: name , runs the tool, appends output, and re-queries the model).
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.