Summary of "How does Claude Code actually work?"

High-level summary

Central idea: a “harness” is the runtime, tools, and environment that let an LLM go beyond text generation (read/edit files, run shell commands, web search, etc.). The model still only produces text; the harness executes tool calls, manages context, permissions, and the back-and-forth with the model — and that dramatically changes code-assistant quality.

Mechanism (tool-calling flow)

The model responds with a structured tool call (special syntax).
The harness executes the tool (e.g., bash, read file, edit file), optionally asking the user for permission on destructive actions.
The harness appends the tool output to the chat history/context and re-queries the model so it continues from the new state.

Each tool call effectively pauses the model’s “brain” and restarts it with the appended output; that loop is the core of how code assistants operate.

Key technological concepts and best practices

Tools vs. models
- The harness supplies tools (read/list/edit/bash/etc.) and tells the model how to call them via the system prompt.
- The model learns to request tools but cannot perform real system actions without the harness.
Minimal core toolset
- For many simple harnesses you only need three basic tools: read file, list files, edit file.
- Optionally add bash, web search, LLM-to-LLM helpers.
System prompt and tool descriptions
- Define tools and usage rules in the system prompt (or via the API’s tools field). This is how the model learns which tools exist and how to invoke them.
- Tweaking wording, deprecations, or guidance in tool descriptions strongly steers model behavior.
Context management and bootstrapping
- The model only knows what’s in the chat history (or pre-seeded “agent.md / CloudMD”). If context is preloaded, the model may skip exploratory calls.
- Tool calls are cheap; harnesses usually find only needed pieces — you rarely need to copy entire codebases into context.
- Large context windows or stuffing entire repos often degrade accuracy; prefer targeted tool-driven context building.
Security & UX
- Harnesses should implement permission checks for destructive operations.
- Different UIs/harnesses handle prompting and permissions differently.
Implementation reality (practical pattern)
- A simple, functional harness can be small (tens to a few hundred lines).
- Typical components shown in the demo:
  - read_file(path) → returns file content (e.g., JSON)
  - list_files(dir) → returns filenames/types
  - edit_file(path, old, new) → replace or write
  - a tool registry and a system prompt instructing the model to emit lines like: tool: name
  - a loop: send messages to model, parse tool-invocations, run tools, append outputs, re-call the model

Product, feature, and analysis highlights

Benchmark finding
- Moving the same models into a harness produced meaningful improvements (example: Opus from 77% to 93% in a benchmark by Matt Mayer).
Cursor vs. provider-native harnesses
- Cursor invests heavily in prompt/tool tuning, vector indexing/search, and per-model adjustments. That micro-tuning is why hosted UIs like Cursor often outperform raw provider UIs.
Model differences
- Different models (Claude, Opus, Gemini, GPT variants) behave differently even under the same harness. Harness authors must tune tool descriptions and prompts per model.
The harness controls the model’s view
- Returning fake tool outputs or modifying tool docs will change the model’s behavior — the model trusts the harness-provided data.
T3 Code vs. harness
- T3 Code is a UI that can surface multiple harnesses (e.g., Claude Code, Codeex). T3 Code itself doesn’t implement tools — it wraps/relies on installed harnesses on the client machine.
Best practices (summary)
- Provide clear tool docs in the system prompt and iterate/tune them.
- Prefer targeted tool-driven context building over stuffing whole repos.
- Implement permission checks for dangerous edits.
- Use search/indexing to help the model find relevant files rather than dumping large contexts.

Tutorials, guides, and references mentioned

Two recommended write-ups for building harnesses:
1. AMP team article (April, previous year) — shows a simple harness mental model and implementation.
2. Mah’s article “The Emperor Has No Clothes” — demonstrates a straightforward, small Python harness.
Video’s hands-on demo
- Builds a minimal Python harness (tool registry, system prompt, loop).
- Demonstrates:
  - running with read/list/edit tools,
  - switching to a bash-only tool and shrinking code,
  - how changing tool descriptions affects model choices,
  - how CloudMD/agent.md can be used to pre-seed context.
Historical/other approaches referenced
- repo-mix (squashing repos into context — discouraged now)
- vector indexing/search (used by Cursor)

Security, limitations, and caveats

Models only generate text; the harness is responsible for actual side effects — harness security matters a lot.
Models can be misled by incorrect tool outputs or misleading tool docs; harnesses should validate outputs where possible.
Provider restrictions vary: some providers restrict using your paid account with third-party tools (Anthropic, Google), while others (OpenAI) are more permissive — this affects which harnesses you can use with your subscription.

References, people, and sources cited

People and authors
- Matt Mayer — benchmark showing harness-driven improvements
- Mah — author of “The Emperor Has No Clothes”
- AMP team — guide/article on building harnesses
- Video speaker / YouTube channel host — main narrator and demonstrator
Products/platforms referenced
- Claude Code (Anthropic), Cursor, Codeex, T3 Code (UI)
- Open Router, OpenAI, Gemini, Opus, Sonnet
- repo-mix (historic)
- Macroscope (sponsor — AI code reviewer/insights tool)

Notes about the demo code pattern

The video includes a hands-on Python demo implementing the pattern described above (tool stubs, system prompt example, parsing loop). A compact snippet or checklist can be extracted from that demo for building a small harness (read/list/edit tools, tool registry, loop that parses tool: name , runs the tool, appends output, and re-queries the model).