Summary of "Cursor, Claude Code and Codex all have a BIG problem"
High-level thesis
Modern AI-first developer tools (Cursor, Claude Code / Cloud Code, Codex/Codeex and similar) demonstrate impressive model capabilities but deliver a poor, unstable developer UX. Many of these tools were built on early/weak models and sloppy engineering patterns, producing persistent “slopfests”: nondeterministic behavior, UI/UX bugs, and bad patterns that propagate and multiply across codebases.
These tools often exhibit nondeterministic failures and unstable UX because they were dogfooded and shipped with immature models and brittle engineering practices.
Key product complaints and concrete examples
Cursor
- Removed a useful agent/editor toggle and replaced it with brittle layout templates; sidebars move unpredictably.
- Frequent UI regressions (changed hotkeys), persistent bugs (easy email leaks), and overall instability make the IDE frustrating.
- Forking and maintaining a large VS Code–derived codebase has created heavy maintenance pain; a restart might be preferable to continued divergence.
Claude Code / Cloud Code (Anthropic)
- CLI/terminal UX is more nondeterministic and buggy than expected:
- Input lag and race conditions when pasting images (attachments processed asynchronously and can attach to the wrong messages).
- Context-limit failures and “thread deaths.”
- Early internal dogfooding with Sonnet 3.5/3.7 contributed to long-term problems.
Codex / Codeex
- Tends to search and copy patterns from large codebases aggressively, which can propagate suboptimal or unsafe patterns simply because they exist in a repo.
Performance & engineering choices
- Some teams have taken extreme measures to address inefficiency and technical debt (example: acquiring the Bun team to fix runtime performance), illustrating how deep the problems can be.
Sponsor / product review: Augment (index + retrieval engine)
- What it does: CLI-based code indexing and a retrieval engine that plugs into agents and IDEs.
- Claimed benefits: instant, highly accurate retrieval of relevant code across very large codebases; faster and more precise than raw model search.
- Demo takeaway: integrating Augment with an agent (the host used it with Codeex) reduced search time from minutes to under ~20 seconds and improved accuracy for tracing logic (for example, subscription logic).
Technical analysis — why this keeps happening
- Codebase inertia
- A codebase’s “peak” quality often appears within ~3–6 months; after that, patterns harden and improvements become much harder.
- Exponential spread of bad patterns
- Good parts grow linearly; bad patterns propagate exponentially. Agents accelerate this by sampling existing code.
- Early model-era contamination
- Many tooling and code artifacts were written with older models (Sonnet 3.5/3.7), producing lower-quality code that became the basis for newer agents.
- Non-determinism
- Models rarely fail the same way twice, making UX failures unpredictable and compounding developer frustration.
Actionable recommendations / guide for teams using AI agents
- Prioritize clarity and speed in code layout
- Make small changes touch few files; avoid architectures that make tiny changes expensive.
- Prefer patterns and frameworks that reduce surface area (example cited: Tailwind).
- Tolerate nothing
- Prevent bad patterns from entering the codebase; “later” rarely happens—fix it immediately or delete it.
- Use sledgehammer rewrites when appropriate
- If a module is deeply broken, deleting and rewriting (now more feasible with AI-generated code) can be cheaper than incremental patching.
- Spend more time in plan/spec mode
- Use the model to co-design a precise plan or markdown spec before generating code; read and validate the plan.
- Use the latest, best models
- Upgrade to newer models (examples cited: Opus 4.6, Codeex 5.3) rather than sticking to older constrained models.
- Isolate or spin up new repos/services
- Avoid adding ad-hoc features into core production code; make it easy to create new internal repos and services.
- Incentivize new projects instead of stuffing features into the main codebase.
- Ask the agent “why” and trace provenance
- If an agent produced a bad pattern, ask where it sourced the idea (your codebase, docs, etc.) and remove the source.
- Consider dual-track codebases
- Prototype rapidly in a “vibe-code” / slop version for fast experimentation, then port validated ideas into a cleaned, production-ready codebase (analogy: Vampire Survivors used Phaser.js for rapid iteration, then ported to C++).
- Measure maintainability by agent transparency
- If an agent can’t explain a feature in under ~3 minutes, the codebase likely needs refactoring.
Practical patterns the host uses / internal tooling
- Uses “sledgehammer” rewrites and has migrated from custom sync solutions to Convex when a codebase hits the plateau.
- Iterative PRs and prototypes:
- Create rough working UX/solutions, then have the team implement the robust version.
- Working on T3 code / T3 chat / T3 code to build a more stable agent interaction layer and dogfood new approaches.
Models & tooling names referenced
- Models: Sonnet 3.5, Sonnet 3.7 (early critiques); Opus 4.6, Codeex 5.3 (positive references).
- Products / companies: Cursor; Claude Code / Cloud Code (Anthropic); Codex / Codeex; OpenAI; Augment (sponsor); Bun (runtime acquisition); T3 (T3 chat, T3 code).
- Example case: Vampire Survivors (Phaser.js prototype → C++ production).
Bottom line
AI-powered coding tools enable rapid iteration, but if teams dogfood immature models and allow low-quality patterns into core repos, the result is brittle, nondeterministic tooling and exploding technical debt. Countermeasures include strict code hygiene, planning/specification with models, selective rewrites, isolating prototypes from production, and upgrading to current models and better retrieval (for example, Augment).
Main speaker / sources
- Video narrator / creator (host) — an early investor in Cursor and someone involved with T3 (T3 chat / T3 code); primary commentator and author of opinions in the video.
- Companies/products discussed: Cursor; Claude Code / Cloud Code (Anthropic); Codex / Codeex; Augment; Bun; T3.
- Model names mentioned as context: Sonnet 3.5, Sonnet 3.7, Opus 4.6, Codeex 5.3.
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.