Summary of "OpenClaw под капотом: скиллы, кроны и агенты в реальной работе""
High-level summary
This is a hands‑on walkthrough / troubleshooting session about running and customizing an OpenClaw (auto‑transcribed as “crab”) agent/assistant stack on a personal Mac Mini. The presenter (Alexey) explains architecture choices, memory and storage strategies, skills (plugins), cron jobs, agent patterns, fallback model routing, file/audio handling, and practical pitfalls encountered in real use (updates, indexing failures, subscription/limit management).
Focus is practical: how the system is assembled, how data flows (raw logs → chunking → embeddings → vector DB / SQL), how skills and agents are organized and invoked, which file and media types the setup can handle, and how to make the system resilient (model failover, backups, cron checks).
Key technological concepts and architecture
Hybrid memory architecture
- Raw message log files
- Each incoming message saved as a JSON line on disk (flight recorder). Keeps raw data for insurance/forensics.
- File‑based long‑term notes
- Manual, high‑importance facts stored in markdown (.md) files (rules, diary, family, projects).
- Indexing pipeline
- A background Memory plugin monitors JSON files, chunks text semantically, calls an LLM/API to create embeddings, and stores vectors in a vector DB.
- Metadata and texts are stored in an SQL store for retrieval and audit.
- Vector DB + SQL
- Vector DB holds semantic vectors; SQL stores text/metadata and retrieval indices.
- Postgres recommended for robust setups; SQLite OK for single‑user/light setups but can break or lose indexing after updates.
- Retention strategy
- Raw JSON files retained (speaker bumped from default 7 days to 30 days) as insurance against DB loss during updates.
Skills vs Agents
- Skills
- Folders containing skill.md that describe behavior, prompts, triggers, examples, and tool usage.
- Small metadata (name/description/trigger) is always kept in context; full skill.md is fetched only when triggered to avoid token bloat.
- Agents
- Separate personalities/processes with their own memory and folders; can run standalone as separate bots.
- Orchestrator agent (OpenClaw) can call other agents; agents can be exposed individually (for example via Telegram).
Model routing, resilience, and cost control
- Automatic chain/fallback
- Example chain: main model (Claude from Anthropic) → Opus 4.5 → lighter/cheaper models (Claude Haiku Easy, Deep models, GLM variants).
- Orchestrator monitors error codes, quota exhaustion, and switches models automatically.
- “Sunnet” rule / strategic routing
- Analyze tasks (cron jobs/workflows) and route demanding tasks to powerful LMs and routine tasks to cheaper models to save tokens without sacrificing quality.
Tools and integrations
- Local Whisper (or whisper script) for audio transcription.
- OCR / PDF extraction pipeline for scanned PDFs (convert pages to images → OCR).
- TTS for voice replies (generate Telegram voice messages).
- Gemini CLI used locally for agent repair/management and a terminal-based agent for fixes.
- Telegram as the primary user interface (text + voice + file exchange).
- Docker is possible, but presenter runs the stack on macOS in a folder (not in Docker).
File and media handling
- Supported/readable formats:
- PDF (including scans via OCR), DOC(X), XLS(X)/CSV, RTF, HTML, code files, XML/JSON, images, audio formats (transcribed), video (frame extraction), and more.
- Processing pipelines:
- Extract text from documents, transcribe audio via Whisper, and generate TTS audio for replies.
Monitoring, safety and operational practices
- Cron jobs used for:
- Periodic limit/quota checks.
- Backup and integrity checks after updates.
- Daily reminders and architecture checklist runs.
- Environment management
- NV/.env file stores API keys locally (do not push to GitHub).
- Keep keys/credentials in a separate secure file; restrict which Telegram IDs the bot accepts messages from.
- Post‑update checks
- Run an automated checklist after upgrades to detect broken skills or changed integrations.
- Data collection recommendation
- Collect onboarding/context info in a single document (questionnaire → document) rather than scattering it across chat messages.
Practical issues, bugs and limitations reported
- Update regressions: after updates, SQLite stopped writing/indexing; the memory plugin stopped saving reliably, risking context loss unless raw files were retained or Postgres used.
- Memory plugin “forgetting” to write: dialogues sometimes failed to persist and required forced writes.
- Port conflicts when running multiple agents on one machine (doctor, Kaizen, Moltis examples).
- Billing/credit problems with hosted LLM services while abroad caused agents to stop (billing errors).
- Local LLM constraints: useful but require heavy GPU/RAM — impractical on small hardware for many tasks unless splitting into many tiny agents or using expensive hardware.
- Windows support: possible but less convenient than macOS/Linux; macOS/Linux preferred for automation. Docker/VM options exist.
Recommended setup and best practices
Hardware
- Mac mini (Apple Silicon / M processors) recommended as a practical local server. The presenter uses Mac Studio (64GB) and Mac Mini; M processors perform well.
On first install (recommended steps)
- Create a single onboarding document (questionnaire) with personal context and instructions, upload as a file and give the agent access — avoids flooding chat with scattered context.
- Configure file retention: set raw logs retention to 30 days (or longer) for safety.
- Choose storage:
- Use Postgres for business‑level/robust vector storage.
- SQLite is fine for personal/single‑user setups but fragile.
- Store API keys in a local .env / NV file and do not commit them.
- Set up cron checks that validate the stack, perform backups, and monitor quotas/limits.
- Implement and test model chaining / automatic fallback policies.
- Use Gemini CLI (or terminal tools) as a repair tool for quick recovery when the web UI is unresponsive.
Skills lifecycle
- Keep a short list (names, descriptions, triggers) always in context; fetch the full skill.md only on trigger to save tokens.
- Author your own skills where possible; when importing, review for safety (prompt injection, external API exposure).
Files / audio handling
- Install Whisper and OCR scripts locally to avoid paying external services for transcription.
- Build skills to read/transcribe PDFs and extract text from scanned pages.
Concrete how-to guides implied by the video
- Design hybrid memory (raw JSON logs + markdown manual memory + vector DB + SQL).
- Migrate/attach a Postgres vector memory to OpenClaw (give agent access to Docker/Postgres, configure embeddings).
- Set up the Memory plugin: monitor JSON files, chunk, embed via LLM API, and store vectors.
- Configure cron jobs for limit/quota monitoring, backup/integrity checks, and security scans.
- Structure and author skills (skill folder + skill.md, triggers, examples).
- Build and register agents (separate personalities/folders, isolated memories).
- Set up file processing:
- PDF: image extraction → OCR.
- Audio: Whisper transcription pipeline.
- TTS for voice replies in Telegram.
- Configure model routing rules: analyze workloads and assign powerful vs cheap models.
- Integrate Gemini CLI for repair and agent management.
Product and model commentary / comparisons
- Claude (Anthropic) / Opus (speaker’s preferred stack)
- Speaker favors Anthropic/Claude and Opus (4.5/4.6) for attentiveness to context, code tasks, and large‑context work.
- Open-source / small LMs
- Useful and cheap for routine tasks but weaker on large context, reasoning, and careful code generation.
- Local LMs are promising but hardware‑intensive.
- Model chaining
- Chains of cheaper LMs can save money but require careful routing and monitored fallbacks to maintain quality.
- Risk mitigation
- Concern about service blocks, account issues, or provider changes — keep local backups and a flexible architecture to move context/skills between models.
Community and operations
- Telegram used as the primary UI and community hub (file exchange, voice messages, bot messaging). Community support recommended.
- Octopus used alongside OpenClaw as an orchestrator or for certain tasks — both are used depending on the job.
- Stack is highly personalized — the speaker stresses customization rather than copying a one‑size‑fits‑all setup.
Main speakers / sources mentioned
- Presenter: Alexey
- Tools / agents: OpenClaw (the “crab” orchestrator), Octopus, Moltis, Kaizen, Doctor (agent examples)
- Models / services: Claude / Anthropic (main), Opus (4.5/4.6), OpenAI (GPT family), Gemini (and Gemini CLI), GLM variants, DeepSic/Deepsic, Gini
- Transcription / TTS: Whisper (local transcription), various TTS utilities
- Storage: SQLite (lightweight), Postgres (recommended), vector databases (unnamed)
- Community / references: Telegram channel, Lieberman brothers (discussion/analysis)
Concise takeaway A pragmatic, production‑like OpenClaw setup uses hybrid memory (raw logs + markdown + vectorized SQL), separates skills and agents, implements automatic model fallbacks, runs cron‑based integrity checks, and processes media locally (Whisper, OCR, TTS). Practical safeguards include backups, local env storage for API keys, using Postgres for long‑term storage, testing failover, and using Telegram / Gemini CLI for daily interaction and recovery.
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.