Summary of "Delete your CLAUDE.md (and your AGENT.md too)"

Summary — key technical points, findings, and practical guidance

Main finding (paper + demos)

Recent benchmarking work evaluated the effect of repo-specific context files (CLAUDE.md, AGENT.md, or other developer “instruction” files) on coding agents resolving real-world GitHub issues. The study used three conditions per repo:

Developer-provided instruction file present
Developer-provided file removed
LLM-generated instruction file created just prior to the task

Results (robust across multiple LLMs and prompts):

Developer-provided files gave only a small average improvement (~+4%).
LLM-generated instruction files slightly decreased performance (~–3%).
Context files increased agent exploration, testing, and reasoning and raised costs by >20%.

Practical experiment shown by the presenter (on his “lawn” project) matched the study: an automatically generated CLAUDE.md made the agent slower and more likely to reach for irrelevant technologies mentioned in the file.

Key result: developer-provided context helps only a little, automatically generated context can hurt, and context files increase agent runtime and cost.

Context management — why CLAUDE.md / AGENT.md matter

Chat context hierarchy (priority order):
1. Provider instructions (highest priority)
2. System prompt
3. Developer message (agent / CLAUDE.md files)
4. User messages / conversation history
The developer message layer sits between system and user prompts and therefore can strongly bias agent behavior.
Large or out-of-date agent MD files can distract or autocomplete the model toward irrelevant details (for example, biasing the agent to use TRPC even when the repo uses a different stack).
Agents often can find needed information by scanning the repo (package.json, schemas, source files). Overloading context is usually unnecessary and costly.

Recommendations / best practices

Default guidance:

Keep agent MD / CLAUDE.md minimal. Prefer omitting generated context files unless they solve a clear, persistent problem.
Use agent MD only to steer repeated, specific misbehaviors (e.g., “always run type checks” if the agent consistently forgets). Otherwise, change the codebase or tooling.

Improve the repo and tooling to make correct agent behavior easier:

Better unit/integration tests, stronger type checking, and CI hooks
Clear file and dependency layout; tool wrappers that include desired checks
Feedback systems that detect when an agent’s change breaks other parts

Practical debugging approach:

Delete or temporarily remove the agent MD and compare behavior and latency.
Iteratively add only the minimal instructions that correct persistent failures.

Useful engineering hacks the presenter shared:

Intentionally mislead the agent (e.g., say “project is greenfield” or “no users”) to avoid unnecessary backfill logic.
Ask the agent to perform a later step (e.g., “ask for step 3”) so it attempts step 2 and unblocks itself.
Let the agent propose edits to the agent MD, then review those proposals to learn where it’s confused — apply fixes to the codebase rather than keeping large docs.

Caveat:

Beware stale docs — an out-of-date agent MD is worse than none. Freshly generated instruction files can still hurt.

Demo highlights

Live init of CLAUDE.md for the presenter’s repo showed the generated file contained many repo details the agent had already found.
Running the same exploratory task with and without the generated file showed the run with the file took longer (approx. 1m29s vs 1m11s in the presenter’s test).
Agent behavior differences:
- Files can make the agent quicker to name file paths.
- Files can also slow the agent overall and make it more likely to pursue the paths suggested in the MD even when inappropriate.

Product / service note (sponsor)

Daytona — elastic containers / secure sandboxes for running agents:

Full compute with TypeScript + Python bindings
Docker/Kubernetes snapshots, virtual desktops / OS automation
Instant spin-up / spin-down and low hourly pricing (presenter quoted approximate numbers) Use case: give agents a full remote machine to run, edit, test, and commit code.

Open questions and future topics

Further discussion promised on “skills” (skill bench, how skills impact agent behavior) and other context-management issues.
Encouragement to experiment: try new models and iteratively remove or minimize agent MD to see what actually helps.

Concrete takeaways

Don’t assume repository instruction files are universally beneficial — they often cost time and money and can bias agents incorrectly.
Start with minimal or no agent MD; only add narrowly scoped guidance to correct repeatable failures.
Invest in repo structure, tooling, tests, and feedback loops — these yield bigger, more robust wins than large context files.

Main speakers / sources

Video presenter (YouTuber / developer explaining the topic and running demos)
Research paper / study benchmarking context files for coding agents (unnamed in transcript)
Mentions / context: Claude (Anthropic), Cloud Code / cloud.ai, OpenAI, and other LLMs (multiple models tested)
Sponsor / product: Daytona (elastic agent sandboxes)
Community mention: “Chad” (commenter referenced)