Summary of "Как НИКОГДА не упираться в лимиты | Claude и Codex"

Main Topic

A guide on how not to hit context/usage limits in chat-based AI coding assistants (Claude, Claude Code, Codex, Cursor, etc.) by reducing token consumption and optimizing chat/workflows.


Key Technological Concepts Explained

Tokens

Context window / memory in a chat

Context degradation

Token cost asymmetry


Practical “Life Hacks” and Product Features (How-To)

  1. Don’t ask unnecessary questions

    • Avoid “re-asking” after an incorrect answer with extra complaints or redundant rework, which burns tokens faster.
  2. Use rollback features in Claude Code / Codex / Cursor

    • If the agent’s edits are a mess, use rollback instead of continuing from a bad state.
    • Rollback modes (conceptually):
      • Rewind conversation: rolls back dialogue history to a point.
      • Rewind code: rolls back code/files to an earlier point while keeping chat history.
      • Full rollback / fork conversation: forks the chat and restores code + conversation from before the mistake.
    • Goal: reduce wasted context and avoid forcing the model to redo everything.
  3. Edit earlier messages instead of re-asking

    • Example: change “Tula” → “Irkutsk” via an inline pencil edit so the assistant rewrites without expanding the dialogue.
  4. Ask multiple questions at once (when appropriate)

    • Bundling reduces how often the model must re-run long chat history.
    • Caveat: not for very large tasks.
  5. Use a “listen/side question” style (non-blocking queries)

    • In Codex-like terminals: start a task, then use a “listen” mode (or similar) to ask a clarifying question without interrupting the main generation thread.
    • Note: after using Escape, both the main and side content are removed from the visible chat (as described).
  6. Force short, direct answers via settings

    • Configure Claude/Codex to respond without fluff to reduce output token usage.
  7. Create new chats periodically

    • Don’t cram a full multi-stage project into one conversation.
    • Recommended structure: one big task per chat; finish it, then start a new chat for the next phase.
    • Use Markdown instruction files when you need context carried forward.
  8. Use subagents for heavy analysis

    • For tasks like an SEO audit:
      • delegate deep analysis to a subagent
      • only the final summary returns to the main chat
    • This is described as available in both cloud and codex environments.
    • Subagents/agents are positioned as instructions defining model, tools, and task scope.
  9. Use Skills (Markdown-based instructions)

    • Create reusable “skill” instructions for recurring tasks so the model doesn’t re-derive the procedure each time.
    • Skills/agents only load their title + description until needed, reducing baseline context load.
  10. Stay under a token budget target (~120k)

    • Even if the context window is large (e.g., up to ~1M tokens), the practical recommendation is limiting usage to around 120,000 tokens due to ongoing history reprocessing and slowdown/context degradation.
    • An update is mentioned that increased context window size, but the degradation/slowdown issue is argued to remain.
  11. Convert documents/files to Markdown

    • Don’t feed PDFs/Word/HTML directly when possible.
    • Reason: PDFs/Word include lots of non-text metadata (font info, coordinates, layout, images, profiles), which becomes extra tokens.
    • Markdown is treated as “native”/preferred and can drastically reduce token footprint (example given: 15M tokens vs 8k tokens).
  12. Manage Claude.md / Agents.md size (keep it small)

    • These global/project instruction files are loaded at session start.
    • If too large, they waste tokens immediately and can pollute context.
    • Rule of thumb: keep them to ~200 lines, and move large blocks (e.g., design systems) into separate Markdown files referenced when needed.
  13. Disable unnecessary tools (MCPs/connectors/features)

    • Extra tools consume tokens and increase latency simply by being available.
    • Example: disable an MCP like a screenshot/UI-checker if not needed.
    • Described as navigating to MCP list and pressing disable.
  14. Pick lightweight models for lightweight tasks

    • Use smaller models for routine edits (example tiers: haiku/sunet/opus).
    • Bigger models cost more tokens/thinking; reserve them for complex work.
  15. Turn on planning mode before execution

    • Enable a plan-first workflow: 1) the agent reads inputs/specs 2) generates a plan without changing code 3) you approve/adjust 4) only then it executes
    • Goal: reduce major mistakes and rework, saving tokens.
  16. Use a planning-focused plugin (“Superpers”)

    • A plugin (reviewed elsewhere by the speaker) that uses built-in skills for a structured workflow:
      • brainstorming → planning → implementation → testing
    • Recommended mainly for larger tasks; considered overkill for trivial ones.

“Stretching” Usage Limits in Claude (5-Hour Window)

Daily 5-hour limits window

Workaround: Claude Routines

Status update on limits (recent change described)


Speaker’s Auxiliary Resources / Recommendations


Main Speakers / Sources

Category ?

Technology


Share this summary


Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Video