Summary of "I Built a Coding Agent That Runs Locally for Free"

Summary of the Video

The video demonstrates a “coding agent” workflow that runs locally and uses free/open-source models to build software autonomously. The agent:

Takes a natural-language request (e.g., “describe what you want to build”)
Plans features and adds them to a Kanban board (e.g., backlog → in progress)
Implements features automatically using an “autopilot”/queue runner
Performs end-to-end testing by opening a browser and running Playwright checks
Produces logs/output, including screenshots captured during testing
Is presented as open source, with encouragement to download, fork, and self-host

Demo: Retro To-Do App

A key example shows building a retro to-do app:

The user requests a feature (“tip of the day”), which becomes a backlog item
Running the queue moves the task from backlog to in progress
The agent’s SDK output is shown, alongside visual verification (screenshots + logs)

Emphasis: Free Models vs Frontier Paid Models

The speaker argues this approach works well with free models, contrasting it with an earlier need for expensive frontier paid models to achieve similar outcomes. They also claim that small local models can be strong at:

Tool calling
Writing code
Multi-model / vision capabilities (used so the agent can “see” browser results)

Local Model Choices and Setup (Tutorial Content)

Recommended Local Coding Models

Qwen 3.6 (35B) is personally recommended.
JML4 (possibly referenced as a “Yi/DeepSeek”-style model name in subtitles) is mentioned, but it requires the 31B variant; smaller sizes are said to be insufficient for tool calling.
Testing note: the speaker used an RTX 4070 and said it worked.

How to Download/Run Models

The video covers two local inference options:

LM Studio
- Search for a model (e.g., “Qwen 3.6”)
- Download it
- Configure it so the coding agent can use it
llama.cpp / “a llama” tool
- Install via a command referenced from llama.com (as transcribed)
- Copy a terminal command to download the model

Important Configuration: Context Length

A major tuning point is that the agent needs a large context window:

The author warns LM Studio may default to a small context length (about 4,000 tokens in subtitles).
They recommend increasing to at least 64,000 tokens, and 128,000 if possible.

Local Forge Installation and Configuration (Tutorial Content)

The workflow is referred to as Local Forge.

Setup Steps

Download the project from a URL (linked in the description)
Star the repository on GitHub (for support)
Install Node.js
Run a platform-specific startup script:
- Windows: start.bat
- macOS/Linux: started.sh (as transcribed)

First Run

The startup script installs dependencies and prints a local URL to open in a browser.
Local Forge is configured with a provider, such as:
- Alum Studio (the current running setup)
- Also supports a llama option
Local Forge can auto-detect previously downloaded models.
A default model is selected (example: Qwen 3.6).

Agent Execution Features and Controls

Concurrency

The speaker recommends running one agent at a time due to resource concerns.
If hardware allows, they mention up to three agents concurrently.

Playwright Browser Verification

Playwright testing can be enabled/disabled:

Browser mode:
- Headless (not shown)
- Headed (browser window visible)
The demo uses headed mode so visual verification is visible.

Multiple Workspaces

Local Forge can run multiple workspaces in parallel
Each workspace maintains its own feature board/tasks.

Workspace/Project Creation Modes

Blank project
- Starts with no items; features are added manually.
AI-described example
- Loads an example project and auto-populates a large backlog (e.g., 19 features).
- Features include: title, detailed description, acceptance criteria, priority, and dependencies.
Describe project to AI
- A chat UI generates a feature plan (example: building a Confluence-like app).
- The agent proposes requirements such as:
  - multi-user auth
  - rich text editor
  - architecture/feature brainstorming
- Then generates a feature list (example: ~15 features).

“Caveat” About Free Models (Advice)

The author explicitly sets expectations:

Free models can be impressive for code writing, but output quality depends heavily on context size and how much detail is provided.
More detailed feature definitions lead to better implementations.

Workaround suggestion:

Optionally use a more capable/paid model (referred to as Claude code) to plan features more precisely,
Then use the free models to execute implementation via the coding agent.

Skills / Tooling Integration: “Local Forge Skill”

The video mentions an agent “skill” integration:

The author adds a local forge skill
The agent can:
- Create a new project using the skill
- Add features to an existing project

Example project: “Infinite draw” (infinite canvas)

Suggested to use a single-user setup
Mentions a potential backend approach like Sequelize + a SQL database (as transcribed: “sequel I database”)

Reviews / Guides / Tutorials Explicitly Presented

The video includes a step-by-step workflow covering:

Downloading and installing Local Forge
Installing Node.js
Running startup scripts
Configuring provider/model in Local Forge
Installing/running local models using LM Studio (optionally via a llama-based approach)
Adjusting context length to 64k–128k tokens
Enabling Playwright verification in headless/headed modes

It also outlines a practical end-to-end cycle:

Create workspace
Generate feature list (AI-assisted)
Run queue
Monitor Kanban progress
View agent logs and browser-test screenshots

Plus guidance and caveats on:

Free model limitations
The importance of detailed feature specs
Using a stronger model for planning only (optional)

Main Speakers / Sources (From the Subtitles)

Primary Speaker

The video’s author (single presenter demonstrating Local Forge + local model setup; identity not provided in subtitles)

Tools/Products Referenced

Local Forge
Alum Studio (local model provider/model server)
llama (download runner tool via llama.com)
LM Studio
Playwright
Node.js
Claude / Claude code (referenced as a planning model option)

Share this summary

Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Summarize another video