Summary of "LIVE: Watch me build a brand-new project from scratch"

Overview

A rare live stream where the speaker plans and “vibe-codes” the early architecture and vocabulary for a new greenfield project: a self-hosted “coding agent observability” platform. The exploration starts broad (e.g., ideas like an AI coding Kanban / token tracking dashboards) and then narrows to observability as the “missing layer.”

Core Product Idea: “Coding Agent Observability”

The goal is a system that helps individuals and/or teams see what coding agents do, including:

Token/cost accounting per session (tokens spent, model usage, context window usage)
Session success/productivity signals
Model/tool observability (which models/prompts/tools were used)
Leaderboards/metrics and possibly aggregate analytics for org-level comparisons
Ability to drill into a specific engineer’s session for debugging/feedback (DRI = “directly responsible individual” for improving agent usage)

Key Differentiator: The “Missing Layer”

The argument: AI observability exists for applications, but there’s a missing layer for observability of “your own coding agents.” In other words: instrument agent runs and view them centrally (or within the org).

Primary Target & UX Decisions

From “Grill Me” questioning, they validate an initial UX split:

A: Individual developer deep dive (personal timeline per session)
B: Cohort/team dashboards (comparisons across developers)
C: Aggregate metrics only

They converge on A + support for B, because managers/DRIs need to review specific engineers’ sessions.

Privacy/Consent Model Discussion

Coding sessions may contain:

Secrets
PII
Half-formed thoughts

Main direction:

Prefer org-visible data with a clear internal trust/visibility model.
Avoid per-session opt-in, which would reduce data quality.
Strong preference for on-prem / self-hosted so data never leaves the organization.

Technical Core: “Ingestion Spine” (Hooks vs JSONL)

A major technical section addresses how to capture agent events reliably across many different coding engines. They explore:

Hooks-based triggers (agent-provided lifecycle events)
JSONL append-only files as a backstop
Per-agent adapters, because agents expose different event surfaces and payload completeness

Coding agents explicitly considered

Claude Code
Cursor CodeX
Pi
Open Code
GitHub Copilot CLI

Findings

A single universal hook payload is insufficient (some include little/no message content).
Adapters are unavoidable: each agent needs a specialized ingestion adapter to normalize events into the platform’s internal schema.

Data Capture Architecture: Sidecar / Per-Session Capture

They settle on an approach after discovering hooks alone aren’t enough:

Use a local installed capture component that runs alongside the agent.
A short-lived “sidecar” process is spawned by a hook at session start and ends when the session ends.

This reduces the operational risk of forgetting to run an always-on daemon.

Tradeoffs considered

Concurrent sessions may spawn multiple capture processes.

Identity / Authentication Model (Self-hosted Backend)

They initially consider:

OIDC (OpenID Connect) with an IDP (Identity Provider)

They pivot to a simpler v1 approach:

Admin-minted per-user tokens (created via backend admin UI)
Developers run a local install/login using the token
Backend verifies bearer tokens
Clean upgrade path, supports de-provisioning via revoking tokens

Backend / Server Architecture Directions

Open decisions include:

Deployment unit: single binary vs multiple services (avoid Kubernetes initially)
Storage preference: likely Postgres over SQLite
Separation: backend/storage separated from the capture binary
Live updates: they consider polling instead of strict 100ms streaming

Domain Modeling & “Ubiquitous Language” (DDD-inspired)

To structure the database and UI, they create a domain glossary, referencing:

DDD (Domain-Driven Design)
Ubiquitous language

Included concepts:

Coding agent
Session: one agent run attached to a dev, including working directory and agent version
Turns: initial user message + full assistant response (later refined)
Model request: one model invocation (HTTP call in their conceptual model)

Key structural concept: Session as a DAG of Turns

A Session contains a DAG of turns to represent branching/forking.
Most agents likely produce a straight-line sequence (a degenerate DAG).
Some agents (notably Pi, per discussion) may produce real branching.

Edge cases iterated

Forking/branch abandonment
Rewinding turns (treated as branching in the DAG)
Resumes/compactions: whether “resume” is the same session (as DAG markers) or a new session linked by parent session ID
Sub-agents: modeled as child sessions with parent session relationships

“Buildable V1” Shape

They converge on an early architecture:

User runs a local listener/capturer (plugin or installed component) near the agent.
The listener sends session events to a self-hosted server.
The server:
- receives events
- stores them in Postgres
- serves a dashboard
- provides an admin plane per organization
They also discuss documenting a plugin mechanism (listener installed via hooks/plugin).

Project Name

The project is narrowed and renamed multiple times. The final (temporary) repo/project name:

“Slop Watch” (chosen over “Yardstick”)

They later refer to it as:

“Slot Watch / Slop Watch server” and “Slop Watch” platform language (with minor transcription confusion)

Project Artifacts Created During the Stream

At least two key research/documentation files are produced:

Research on coding agent ingestion differences
Research/decisions capturing V1 architecture and domain language

They also:

Start a repository (inside a “repos AI” org)
Publish it publicly to GitHub

Main Speakers / Sources

Main speaker: Matt (referred to as “Matt Pocock” in discussion; also addressed as “Matt” in chat)
Primary AI assistant used in-stream: Claude (invoked via “Claude code” and “Grill Me” skill)

Other cited tools/agents/systems

WhisperFlow
Sandcastle
Cursor CodeX (Codex)
Pi
Open Code
GitHub Copilot CLI
Vercel workflows (mentioned)
OIDC/IDP (concept explanation)
Postgres
JSONL
DDD
TRPC (mentioned as an idea)
OPA/WorkOS (briefly referenced)

Share this summary

Is the summary off?