Summary of "We all know bash sucks. Why make our agents suffer?"
High-level thesis
Current practice gives LLM agents a single “bash” tool so they can run terminal commands (read files, run builds, edit code, install packages) instead of pasting huge amounts of code into the prompt. Bash is an important stepping stone but not sufficient long‑term—it’s missing types, standards, security/permission semantics, multi‑tenant isolation, and determinism. The next generation of agent infrastructure should use typed, sandboxed execution (often TypeScript/JavaScript) and SDK‑style tool interfaces so agents can fetch only the tiny context they need and run deterministic code.
Key technical problems
Tokenization and context window limits
- LLMs are token‑based autocompleters with limited context. Dumping whole repositories into prompts:
- wastes tokens,
- reduces model quality,
- increases cost.
- Small files can be >1k tokens; adding many files to history makes behavior nondeterministic.
- Recommendation: avoid repo‑wide paste strategies (criticizes tools like Repo Mix). Instead, fetch minimal, relevant code snippets.
Determinism vs non‑determinism
- LLM outputs are inherently non‑deterministic; relying on the model to find/filter the right content in a giant context is unreliable.
- Better approach: have the agent generate short, deterministic commands (e.g., ripgrep‑like queries or SDK calls) that fetch exact context reliably.
Bash limitations
- Pros:
- Universal, text‑driven, familiar.
- A single tool can perform many tasks.
- Cons:
- No standard typing for inputs/outputs.
- No way to mark “destructive” vs “read‑only” operations.
- Poor multi‑tenant isolation and no structured permission model.
- CLIs and pasted specs bloat model context.
- UX problem: frequent approval prompts cause blind “yes” behavior or insecure skip modes.
Alternatives and solutions
Tool calling + code-as-middleware
- Convert tool/CLI specs into SDKs (TypeScript/JavaScript) the model can write code against, rather than passing tools as raw text.
- Cloudflare Code Mode and similar approaches provide discoverable TypeScript SDKs so agents write code to fetch/filter data deterministically.
- Benefits: fewer tokens, lower latency, better accuracy.
- Example benchmark: average tokens reduced from ~43.5k to ~27k (~40% reduction); accuracy improved by ~3 points in some tests.
Typed, sandboxed execution environments (TypeScript/JS)
- Give agents a strongly typed, isolated runtime to call tools via typed interfaces.
- Advantages:
- Portable, shareable environment configuration (TypeScript files describe the agent API).
- Strong typing enables richer permission/approval rules (auto‑approve read‑only; require approval for destructive writes).
- Runs in isolates (V8, Node workers, Cloudflare Workers, browser) so many agents can share a kernel safely.
- Deterministic behavior for filtering/processing because code executes in‑process rather than being simulated by the LLM.
- Example patterns and projects:
- just-bash / just-js (virtualized bash implemented in JS/TS so agents “think” they have bash but run safely in memory).
- Versel’s fake bash discussions and related debates.
- Reese’s Exeutor: execution environment for safe, typed tool calls.
- Cloudflare Code Mode: converts MCP/tool specs into TypeScript SDKs.
- Dax experiments removing bash and making agents write JS directly.
Sandboxing and secure execution vendors
- Vendors/solutions to run agent‑generated JS/TS securely and per‑user:
- Rivet
- Daytona (sponsor)
- Other emerging sandbox providers
These solutions enable isolated execution and safer multi‑tenant deployment of agent code.
Agent design and permission patterns (recommended)
- Prefer typed SDKs/APIs over raw CLI text.
- Allow agents to fetch minimal context (short grep/SDK query) instead of loading whole repos.
- Implement approval rules based on operation type:
- Auto‑approve read‑only operations.
- Require explicit approval for destructive writes.
- Support wildcard approvals and role‑based access for teams.
- Provide multi‑account and cross‑tool signing/approval sharing (SSO for agent sessions) to avoid approval fatigue and unsafe “always approve” habits.
- Use per‑user isolates or virtual file systems so agents can run on shared infrastructure without escaping to other users’ data.
Practical demos, sponsors, and product mentions
- Browserbase demo (sponsor): GPT‑5.4 generates and executes JS in the browser to interact with complex web UIs (demo: playing Wordle by writing JS to manipulate page state).
- Depot (sponsor): alternative CI/CD runner and debugging experience; agent‑friendly CLI and low runtime pricing.
- Daytona (sponsor): sandbox/secure execution provider.
- T3 Code: open source project by the presenter (context for the exploration; free/open).
- Other tools/platforms mentioned: Cursor, Claude Code, Codex CLI, Cloudflare Code Mode, just-bash/just-js, Rivet, Exeutor, Repo Mix (criticized).
Measured benefits
- Agents writing/running code (TS/JS) instead of relying on LLM filtering of large contexts yields:
- Large reductions in token usage and latency.
- Improved accuracy and determinism for data filtering and tool orchestration.
- Better composability: local filtering returns concise results to the model.
Actionable takeaways
- Don’t paste entire repositories into prompts — prefer commands or SDK calls to fetch small, relevant snippets.
- Favor typed SDKs and sandboxed JS/TS runtimes for agent execution when possible.
- Build approval/permission models layered on typed tool interfaces (auto‑approve safe ops; require checks for destructive ops).
- Use or build virtualized per‑user environments (isolates/virtual filesystems) so agents can run on shared infra safely.
- Explore “agents write code” patterns (models generate small scripts/TS functions to fetch/filter/transform data deterministically).
Recommended readings and resources
- Presenter’s earlier MCP videos on converting tools to SDKs / code‑mode patterns.
- Reese’s write‑up on the execution layer and Exeutor.
- Projects to try: T3 Code, just-bash / just-js, Cloudflare Code Mode, Rivet, Daytona, Depot, Browserbase demo.
Main speakers and referenced sources
- Presenter / video host (creator of T3 Code).
- Reese (author of the execution layer write‑up; built Exeutor).
- Ben (built the Browserbase demo app).
- Other people/streams mentioned: Primagen, Dax (OpenCode experiments).
- Companies and models referenced: OpenAI (GPTs), Anthropic, Google/Gemini, Cloudflare, Vercel.
- Tools/companies cited: T3 Code, Cursor, Claude Code, Codex CLI, Repo Mix, just-bash / just-js, Cloudflare Code Mode, Exeutor, Rivet, Depot, Browserbase, Daytona.
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.