Summary of "Full Walkthrough: Workflow for AI Coding — Matt Pocock"
Main ideas / lessons conveyed
-
AI coding still benefits from classic software engineering fundamentals. The workshop argues that you should not treat AI as replacing software engineering practice. Instead, apply established principles like:
- task sizing
- iterative feedback
- testing discipline
- maintainable design
-
LLMs have two practical constraints:
- “Smart zone” vs “dumb zone” (context limits): performance degrades as you add too much conversational context/tokens in the same run.
- “Forgetting” when context is cleared: when context is cleared, the model tends to reset to the system prompt, and each session often follows similar internal phases.
-
Plan vs implement must be handled differently:
- Planning/alignment must be human-in-the-loop (you actively ask/answer questions to reach shared understanding).
- Implementation can often be AFK (human leaves the loop) once the work is structured clearly and broken into tasks.
-
Avoid “specs-to-code” as the primary approach. The speaker critiques workflows where you repeatedly edit specifications and ignore the code—“vibe coding by another name.” The code remains the battleground.
-
Use a structured alignment process (“grill me”) to reach a shared design concept rather than just generating a plan.
-
Turn the design concept into destination artifacts (not endless planning):
- write a PRD (Product Requirements Document) as the destination document
- then transform the PRD into actionable tasks (Kanban + issues)
-
Use “vertical slices / traceable bullets” over horizontal slicing.
- Horizontal: implement a full layer (DB → API → frontend) before integration feedback—AI can “code blind.”
- Vertical: implement thin end-to-end slices across layers to get fast feedback on real behavior.
-
Implementation quality depends heavily on feedback loops:
- Use TDD (red → green → refactor) to force correct behavior and add meaningful tests.
- Include an AI review step after implementation (AI can review using clearer context in testing/drafting modes).
-
Code architecture matters for agent performance:
- Prefer deep modules (few interfaces, lots of internal functionality, clear test boundaries) over shallow modules (many tiny parts with complex dependency graphs).
- Improve module structure so feedback loops work better.
Methodology / workflow (detailed bullet list)
1) Start with AI-alignment and context management
- Use “smart zone / dumb zone”:
- keep tasks/conversation sufficiently small to avoid the dumb zone
- expect degradation around roughly ~100K tokens (speaker rule-of-thumb)
- Treat each LLM session as having stages:
- system prompt (keep minimal)
- exploration
- implementation
- testing/feedback
- Prefer workflows where you can clear context and return to a consistent starting state (Memento reference), rather than relying on “compacting.”
- Track token usage during sessions (speaker recommends monitoring exact token counts).
2) Alignment phase: use “Grill Me” to create a shared design concept (human in the loop)
- Trigger a “grill me” skill by invoking it in the repo workflow and providing a client brief (example: gamification for a course retention problem).
- The skill’s behavior:
- interviews you relentlessly until shared understanding is reached
- walks through a decision tree question-by-question
- provides recommended answers for each question
- Maintain alignment using interactive Q&A (example questions include scoring rules, whether progress should be retroactive, and progression curve/levels).
- Allow grilling to persist until convergence (could take dozens of questions).
Key principle: you’re not merely “getting a plan,” you’re achieving shared understanding between developer and AI.
3) Turn the design concept into destination documents (PRD)
- After grilling, use “write PRD.”
- PRD purpose (“destination documents”):
- summarize what you decided during grilling
- include:
- problem statements
- solution outline
- user stories
- implementation decisions
- testing decisions
- a defined definition of done
- The speaker emphasizes early PRDs generally don’t need thorough correctness reviews—the value is synchronization/summary, not deep validation yet.
4) Convert PRD into an executable task structure (Kanban + issues)
- Convert the PRD into a Kanban board of tasks with blocking relationships.
- Classify tasks as:
- AFK tasks (can be delegated to agents)
- Human-in-the-loop tasks (planning/alignment remains human-driven)
- Prefer parallelizable execution:
- use Kanban “blocking” to form a DAG-like execution plan
5) Use traceable bullets / vertical slices to avoid blind integration
- Break work into traceable vertical slices:
- each slice crosses multiple layers (schema, service, UI)
- delivers early integrated feedback
- Rule of thumb:
- the first slice should be visible and testable end-to-end (e.g., points awarded reflected in a dashboard), not just a horizontal chunk like “DB schema alone.”
6) Implementation phase: delegate to AFK agents (Ralph loop)
- Once the issue/backlog structure is ready, switch to an AFK implementation loop:
- provide local issue files and repo state to the agent
- the agent selects tasks sequentially (in the demo) and executes them
- Typical task completion instructions:
- explore the repo
- use TDD to complete tasks
- run feedback loops (tests + type checks)
- Use a Docker sandbox for safe, reproducible execution (speaker avoids asking attendees to install Docker manually).
7) Ensure quality with automated feedback loops during implementation
- After implementing:
- run tests and type checks
- fix failures inside the agent loop
- Use AI review:
- have the agent perform a code review step after writing code
- structure so reviewer remains in the “smart zone” despite context clearing
8) Manual QA and “taste” enforcement
- Manual QA is still essential:
- humans re-impose expectations/taste
- confirm behavior in real flows
- Automated QA alone isn’t sufficient because product quality and UI/behavior require human judgment.
9) Continuous improvement of codebase architecture for agent effectiveness
- If the codebase is “bad” or hard for agents:
- improve modularity so feedback loops work better
- Apply deep modules strategy (Ousterhout-inspired):
- small, clear interfaces
- deeper internal functionality
- avoid “shallow module” patterns that create fragile testing boundaries and complex dependency graphs
- Use an “improve code base architecture” capability to scan and propose where to deepen modules.
- Example outcome: identify missing tests (e.g., scoring service) and propose extraction/restructuring.
10) Parallelization and merging (scaling beyond one agent)
- Use an orchestrator to:
- plan issues for parallel work
- run multiple sandboxed implementer agents
- merge results afterward
- The workshop references a workflow enabled by a tool/library called Sandcastle, which:
- creates worktrees/sandboxes per issue
- runs implementers per branch
- reviews per-branch changes
- merges with a “merger agent” that resolves conflicts and fixes test/type issues
Instructions explicitly referenced / principles named
- “Don’t bite off more than you can chew”
- Martin Fowler / refactoring advice (don’t overreach per step)
- Pragmatic Programmer advice (keep tasks small)
- Frederick P. Brooks concept of shared design concept (The Design of Design)
- Ralph Wiggum software practice metaphor (end-state PRD + “small changes get closer,” but more structured)
- Traceable bullets / vertical slices (Pragmatic Programmer)
- TDD: red → green → refactor
- Deep modules vs shallow modules (John Ousterhout)
Speakers / sources featured (as mentioned in the subtitles)
Speaker(s)
- Matt Pocock — workshop presenter; described as a teacher teaching AI
Other people / referenced individuals
- Mike — mentioned by Matt during early logistics / question check
- Dex Hardy — runs Human Layer; proposed the smart zone / dumb zone concept for LLMs
- Martin Fowler
- The Pragmatic Programmer
- Ralph Wiggum
- Frederick P. Brooks (The Design of Design)
- Memento (metaphor for forgetting/resetting)
- Steve — named when “pulling up slides/using a TL draw canvas”/context
- Sarah Chen — client brief author in the exercise example; “Claude always chooses Sarah Chen”
Sources / works / frameworks referenced
- Refactoring (Martin Fowler)
- The Design of Design (Frederick P. Brooks)
- “Traceable bullets” / vertical slices (Pragmatic Programmer concept)
- Ousterhout: The Philosophy of Software Design
- Specs-to-code movement (criticized)
Tools / tech referenced
- Claude Code / Claude
- Opus / Sonnet (model choices for different roles)
- Gemini
- Slido
- Docker
- NPX/Vitest (referenced as
npx viteststyle testing) - TypeScript
- GitHub Issues
- Kanban
- Sandcastle (tool/library introduced by Matt)
Category
Educational
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.