Summary of "I cut my OpenClaw API bill by 80% with one config change"

Problem

OpenClaw routes every request to your primary model by default. Trivial tasks (heartbeats, quick lookups, sub-agent work) use the same expensive model as complex reasoning, which wastes money.
There is no built-in provider fallback by default, so provider rate limits can halt your agent.

Use different models for different task tiers and add cross-provider fallbacks to avoid outages.

Frontier models (heavy reasoning)
- Examples: Opus, GPT-5.2 — for architecture, large refactors, and other complex reasoning.
Mid-tier models (daily work)
- Examples: Sonnet, DeepSeek R1 — for code generation, research, content.
Cheap / utility models (background, heartbeats)
- Examples: Gemini Flash-Lite, DeepSeek V3.2, GLM 4.7 — for heartbeats, quick lookups, classification, background checks.

Add a fallback chain that spans providers (e.g., fall back to GPT-5.2 or another provider before falling to a same-provider mid-tier) to prevent total outages when a provider is rate-limited.

Opus: ~ $30 per million tokens; ~50 tokens/sec.
GPT-5.2: similar premium pricing.
Sonnet / DeepSeek R1: mid-tier; cheaper with good reasoning (DeepSeek R1 cited at $2.74 per million tokens).
Gemini 2.5 Flash-Lite: $0.50 per million tokens.
DeepSeek V3.2: $0.53 per million tokens.
Gemini 3 Flash: ~250 tokens/sec (faster than Opus).
Cheap models can be ~60x cheaper than Opus for trivial requests.

Manual configuration (recommended)
- More control: explicitly map tasks to models and define fallbacks.
OpenRouter auto-router
- Automatic routing based on prompt complexity; less control but minimal setup.

Edit the OpenClaw config file: ~/.openclaw/openclaw.json.
Define model sections for heartbeats, sub-agents, vision tasks, and fallback chains.
Create aliases for model shortcuts (e.g., opus, sonnet, flash, ds).
Restart the OpenClaw gateway.
Switch models on the fly with commands like /model sonnet, /model opus, or use /models to list providers.
Test to ensure heartbeats and background tasks now use cheaper models while primary agent tasks still use the frontier model.

Heartbeats → Gemini 2.5 Flash-Lite (~$0.50/m) instead of Opus.
Sub-agents → DeepSeek R1 (~$2.74/m) instead of Opus.
Primary tasks remain on Opus / GPT-5.2.
First fallback uses a different provider (e.g., GPT-5.2) to avoid provider-wide rate limits.

Interactive calculator: https://calculator.vlvt.sh
Example savings (monthly):
- Light user: ~$200 → ~$70 (≈65% less).
- Power user: ~$943 → ~$347 (≈62% less; ≈$600 saved/month).
- Heavy user: ~$3,000 → ~$1,000 (≈62% less; ≈$1,700 saved/month).
You can plug in different primary/heartbeat/sub-agent models to see custom savings and copy config snippets.

Free tiers commonly have aggressive rate limits, slow performance, and can disappear unexpectedly.
The recommendation is to use near-free paid models (tens of cents per million tokens) for production reliability.

Config examples and openclaw.json snippets: referenced in the video description.
Savings calculator: https://calculator.vlvt.sh
Model/provider lists and aliases: available in the video description.

Add model-tiering to ~/.openclaw/openclaw.json:
- Heartbeat → cheap model
- Sub-agents → mid-tier
- Primary → frontier
Add a cross-provider fallback chain.
Create model aliases for quick switching.
Restart the gateway and test switching with /model commands.
Use the calculator (calculator.vlvt.sh) to estimate savings for your usage.

Video presenter / author (demonstrates the config and calculator).
Providers/models referenced: Anthropic (Opus, Sonnet), OpenAI (GPT-5.2), Google (Gemini Flash variants), OpenRouter, DeepSeek (R1, V3.2), GLM 4.7, Kimi K2.5.