Summary of "LIVE: Watch me build a brand-new project from scratch"
Overview
A rare live stream where the speaker plans and “vibe-codes” the early architecture and vocabulary for a new greenfield project: a self-hosted “coding agent observability” platform. The exploration starts broad (e.g., ideas like an AI coding Kanban / token tracking dashboards) and then narrows to observability as the “missing layer.”
Core Product Idea: “Coding Agent Observability”
The goal is a system that helps individuals and/or teams see what coding agents do, including:
-
Token/cost accounting per session (tokens spent, model usage, context window usage)
-
Session success/productivity signals
-
Model/tool observability (which models/prompts/tools were used)
-
Leaderboards/metrics and possibly aggregate analytics for org-level comparisons
- Ability to drill into a specific engineer’s session for debugging/feedback (DRI = “directly responsible individual” for improving agent usage)
Key Differentiator: The “Missing Layer”
The argument: AI observability exists for applications, but there’s a missing layer for observability of “your own coding agents.” In other words: instrument agent runs and view them centrally (or within the org).
Primary Target & UX Decisions
From “Grill Me” questioning, they validate an initial UX split:
-
A: Individual developer deep dive (personal timeline per session)
-
B: Cohort/team dashboards (comparisons across developers)
-
C: Aggregate metrics only
They converge on A + support for B, because managers/DRIs need to review specific engineers’ sessions.
Privacy/Consent Model Discussion
Coding sessions may contain:
- Secrets
- PII
- Half-formed thoughts
Main direction:
- Prefer org-visible data with a clear internal trust/visibility model.
- Avoid per-session opt-in, which would reduce data quality.
- Strong preference for on-prem / self-hosted so data never leaves the organization.
Technical Core: “Ingestion Spine” (Hooks vs JSONL)
A major technical section addresses how to capture agent events reliably across many different coding engines. They explore:
- Hooks-based triggers (agent-provided lifecycle events)
- JSONL append-only files as a backstop
- Per-agent adapters, because agents expose different event surfaces and payload completeness
Coding agents explicitly considered
- Claude Code
- Cursor CodeX
- Pi
- Open Code
- GitHub Copilot CLI
Findings
- A single universal hook payload is insufficient (some include little/no message content).
- Adapters are unavoidable: each agent needs a specialized ingestion adapter to normalize events into the platform’s internal schema.
Data Capture Architecture: Sidecar / Per-Session Capture
They settle on an approach after discovering hooks alone aren’t enough:
- Use a local installed capture component that runs alongside the agent.
- A short-lived “sidecar” process is spawned by a hook at session start and ends when the session ends.
This reduces the operational risk of forgetting to run an always-on daemon.
Tradeoffs considered
- Concurrent sessions may spawn multiple capture processes.
Identity / Authentication Model (Self-hosted Backend)
They initially consider:
- OIDC (OpenID Connect) with an IDP (Identity Provider)
They pivot to a simpler v1 approach:
- Admin-minted per-user tokens (created via backend admin UI)
- Developers run a local install/login using the token
- Backend verifies bearer tokens
- Clean upgrade path, supports de-provisioning via revoking tokens
Backend / Server Architecture Directions
Open decisions include:
-
Deployment unit: single binary vs multiple services (avoid Kubernetes initially)
-
Storage preference: likely Postgres over SQLite
- Separation: backend/storage separated from the capture binary
- Live updates: they consider polling instead of strict 100ms streaming
Domain Modeling & “Ubiquitous Language” (DDD-inspired)
To structure the database and UI, they create a domain glossary, referencing:
- DDD (Domain-Driven Design)
- Ubiquitous language
Included concepts:
- Coding agent
- Session: one agent run attached to a dev, including working directory and agent version
- Turns: initial user message + full assistant response (later refined)
- Model request: one model invocation (HTTP call in their conceptual model)
Key structural concept: Session as a DAG of Turns
- A Session contains a DAG of turns to represent branching/forking.
- Most agents likely produce a straight-line sequence (a degenerate DAG).
- Some agents (notably Pi, per discussion) may produce real branching.
Edge cases iterated
- Forking/branch abandonment
- Rewinding turns (treated as branching in the DAG)
- Resumes/compactions: whether “resume” is the same session (as DAG markers) or a new session linked by parent session ID
- Sub-agents: modeled as child sessions with parent session relationships
“Buildable V1” Shape
They converge on an early architecture:
- User runs a local listener/capturer (plugin or installed component) near the agent.
- The listener sends session events to a self-hosted server.
- The server:
- receives events
- stores them in Postgres
- serves a dashboard
- provides an admin plane per organization
- They also discuss documenting a plugin mechanism (listener installed via hooks/plugin).
Project Name
The project is narrowed and renamed multiple times. The final (temporary) repo/project name:
- “Slop Watch” (chosen over “Yardstick”)
They later refer to it as:
- “Slot Watch / Slop Watch server” and “Slop Watch” platform language (with minor transcription confusion)
Project Artifacts Created During the Stream
At least two key research/documentation files are produced:
- Research on coding agent ingestion differences
- Research/decisions capturing V1 architecture and domain language
They also:
- Start a repository (inside a “repos AI” org)
- Publish it publicly to GitHub
Main Speakers / Sources
- Main speaker: Matt (referred to as “Matt Pocock” in discussion; also addressed as “Matt” in chat)
- Primary AI assistant used in-stream: Claude (invoked via “Claude code” and “Grill Me” skill)
Other cited tools/agents/systems
- WhisperFlow
- Sandcastle
- Cursor CodeX (Codex)
- Pi
- Open Code
- GitHub Copilot CLI
- Vercel workflows (mentioned)
- OIDC/IDP (concept explanation)
- Postgres
- JSONL
- DDD
- TRPC (mentioned as an idea)
- OPA/WorkOS (briefly referenced)
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.