Summary of "Save 95% of Your Tokens - OpenClaw Full Tutorial"
Goal
Reduce OpenClaw API costs massively (author claims >90% savings; personal drop from ~$100/day → <$5/day; helped friend $200/day → ~$10/day). Focus areas: model routing/selection, caching, context pruning, session initialization, local heartbeats, token audits, and practical deployment.
Quick checklist of optimizations (what to change and why)
-
Model routing / model selection
- Add multiple providers and models in openclaw.json (examples: Anthropic Claude — Haiku, Sonnet, Opus; OpenAI — GPT-5.1, GPT-5-mini).
- Use an inexpensive default model (e.g., Claude Haiku 4.5/4.6) and escalate to Sonnet/Opus or GPT-5 only for advanced reasoning tasks.
- Configure fallback order so agents automatically switch if a provider is rate-limited.
-
Session initialization
- Load only minimal files at session start: soul.md, user.md, and today’s memory file.
- Avoid auto-loading full conversation history or past tool outputs; run memory search only when the user asks about past context.
- At session end, write a short summary to memory (e.g., <500 words, bullet points).
-
Prompt caching (major saving)
- Enable model-side prompt caching (configure in model settings) for large/expensive models (Opus/Sonnet).
- Configure cache retention (short ~5 min inactivity, long ~1 hour). Caching reduces repeated token costs for static prompts (soul.md, user.md).
- Note: caching has a write cost but yields huge read savings. Cache-hit rates are visible in the gateway.
-
Context pruning
- Add rules to prune stale tool outputs / old messages after a TTL (e.g., ~55 minutes) to prevent context windows from bloating token counts.
- Put context-pruning config in the defaults section of openclaw.json.
-
Local heartbeat offload (Olama + Llama)
- Heartbeats (periodic checks) usually hit paid APIs — route them to a free local model instead.
- Install Olama on the VPS and run a lightweight Llama-3.2-3B model to handle heartbeat checks (CPU-capable).
- Configure the heartbeat snippet in openclaw.json defaults so heartbeats call the local model, not paid providers.
-
Spending limits and budget rules
- Set platform monthly spending caps and disable auto-recharge to prevent unexpected charges.
- Add rate-limit / pacing / spending rules in soul.md (e.g., minimum seconds between API/web requests, daily/monthly budgets, notify thresholds, fallback behavior on rate-limits).
-
Token audits & monitoring
- Use the OpenClaw gateway “Usage” tab and built-in commands to inspect token/cost usage.
- Useful slash commands: /st status, /context list, /context detail to see token counts per file, cache hit rate, and session totals.
- Run a token-audit prompt to get a session-level cost breakdown and recommendations.
Practical deployment & tooling (how to apply changes)
- Host environment: run OpenClaw in an isolated VPS (author uses Hostinger one‑click Docker + KVM plans).
- Access the server via SSH. If running in Docker, enter the container:
docker ps
docker exec -it <container-id> /bin/bash
- Edit files using:
- OpenClaw CLI inside the container, or
- VS Code Remote - SSH: open /docker/openclaw/data/openclaw for easier editing than nano.
- Files to edit:
- openclaw.json (models, defaults, heartbeat, context pruning)
- soul.md (routing rules, session rules, budget rules)
- heartbeat.md (heartbeat prompt)
- Restart to apply changes:
docker restart <container-id>
# or restart the gateway if not using Docker
- Olama quick steps:
- Install via curl install script (as in Olama docs)
- Enable and start: systemctl enable –now olama
- Pull/run model: olama pull llama-3.2b; olama run llama-3.2b
- Test with a sample prompt
Commands & UI pointers
- Common Docker commands:
docker ps
docker exec -it <container-id> /bin/bash
docker restart <container-id>
- VS Code: Remote - SSH → open folder /docker/openclaw/data/openclaw
- Gateway slash commands:
- /st status
- /context list
- /context detail
- Olama:
systemctl start olama
olama pull llama-3.2b
olama run llama-3.2b
# test with a small prompt
Metrics & effects shown
- Cache hit rate reported in the gateway (example: 99.9%).
- Gateway Usage tab shows: messages, tool calls, average tokens/message, cost by model and token type (cache writes/reads).
- Author demonstrates large cost reductions after implementing routing, caching, local heartbeat, and pruning.
Guides, downloads, and additional materials referenced
- Presenter created a downloadable guide (prompts, JSON snippets, ready config) linked in the video description (requires email to receive).
- Credits/inspiration: Matt Ganzic — guide/video linked in the description.
- Presenter’s other videos cover full setup, security/hardening, enabling skills/voice — this particular video focuses on cost optimization.
Best practices & security notes
- Run OpenClaw in an isolated VPS (not on your primary desktop).
- Always use API spending limits and monitor usage.
- Remove or rotate keys after testing; never publish keys in configuration files.
- Test changes in a fresh instance to avoid unexpected configurations.
Main speakers / sources
- Presenter: Tech With Tim (references his channel and discount code).
- Credit / inspiration: Matt Ganzic.
- Tools / providers referenced: OpenClaw, Hostinger, Docker, Visual Studio Code Remote-SSH, Anthropic (Claude), OpenAI (GPT models), Olama, Llama (3.2b).
Notes
Exact JSON/snippet examples (openclaw.json, soul.md, heartbeat) and a condensed step‑by‑step checklist were mentioned as available in the original material.
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.