Summary of "Как НИКОГДА не упираться в лимиты | Claude и Codex"

Main Topic

A guide on how not to hit context/usage limits in chat-based AI coding assistants (Claude, Claude Code, Codex, Cursor, etc.) by reducing token consumption and optimizing chat/workflows.

Key Technological Concepts Explained

Tokens

Tokens are the basic units of text consumed by the model; every character/punctuation counts.
Approximate rule of thumb:
- ~3/4 of an English word per token
Language note:
- Russian is less efficient: 1 Russian word ≈ 2–3 tokens
- So Russian text can cost ~2–3× more tokens.

Context window / memory in a chat

The model must re-read all prior messages every time it generates a response.
In long chats, most tokens are spent on re-processing history, not producing the new answer.

Context degradation

As the chat grows, the model can “forget” or become less sharp.
The described effect is getting “dull” around ~100 messages.

Token cost asymmetry

Input tokens are cheaper than output tokens (claim: output can be ~5× more expensive).
Because the model generates output word-by-word and re-processes prior generated text, long answers cost significantly more tokens.

Practical “Life Hacks” and Product Features (How-To)

Don’t ask unnecessary questions
- Avoid “re-asking” after an incorrect answer with extra complaints or redundant rework, which burns tokens faster.
Use rollback features in Claude Code / Codex / Cursor
- If the agent’s edits are a mess, use rollback instead of continuing from a bad state.
- Rollback modes (conceptually):
  - Rewind conversation: rolls back dialogue history to a point.
  - Rewind code: rolls back code/files to an earlier point while keeping chat history.
  - Full rollback / fork conversation: forks the chat and restores code + conversation from before the mistake.
- Goal: reduce wasted context and avoid forcing the model to redo everything.
Edit earlier messages instead of re-asking
- Example: change “Tula” → “Irkutsk” via an inline pencil edit so the assistant rewrites without expanding the dialogue.
Ask multiple questions at once (when appropriate)
- Bundling reduces how often the model must re-run long chat history.
- Caveat: not for very large tasks.
Use a “listen/side question” style (non-blocking queries)
- In Codex-like terminals: start a task, then use a “listen” mode (or similar) to ask a clarifying question without interrupting the main generation thread.
- Note: after using Escape, both the main and side content are removed from the visible chat (as described).
Force short, direct answers via settings
- Configure Claude/Codex to respond without fluff to reduce output token usage.
Create new chats periodically
- Don’t cram a full multi-stage project into one conversation.
- Recommended structure: one big task per chat; finish it, then start a new chat for the next phase.
- Use Markdown instruction files when you need context carried forward.
Use subagents for heavy analysis
- For tasks like an SEO audit:
  - delegate deep analysis to a subagent
  - only the final summary returns to the main chat
- This is described as available in both cloud and codex environments.
- Subagents/agents are positioned as instructions defining model, tools, and task scope.
Use Skills (Markdown-based instructions)
- Create reusable “skill” instructions for recurring tasks so the model doesn’t re-derive the procedure each time.
- Skills/agents only load their title + description until needed, reducing baseline context load.
Stay under a token budget target (~120k)
- Even if the context window is large (e.g., up to ~1M tokens), the practical recommendation is limiting usage to around 120,000 tokens due to ongoing history reprocessing and slowdown/context degradation.
- An update is mentioned that increased context window size, but the degradation/slowdown issue is argued to remain.
Convert documents/files to Markdown
- Don’t feed PDFs/Word/HTML directly when possible.
- Reason: PDFs/Word include lots of non-text metadata (font info, coordinates, layout, images, profiles), which becomes extra tokens.
- Markdown is treated as “native”/preferred and can drastically reduce token footprint (example given: 15M tokens vs 8k tokens).
Manage Claude.md / Agents.md size (keep it small)
- These global/project instruction files are loaded at session start.
- If too large, they waste tokens immediately and can pollute context.
- Rule of thumb: keep them to ~200 lines, and move large blocks (e.g., design systems) into separate Markdown files referenced when needed.
Disable unnecessary tools (MCPs/connectors/features)
- Extra tools consume tokens and increase latency simply by being available.
- Example: disable an MCP like a screenshot/UI-checker if not needed.
- Described as navigating to MCP list and pressing disable.
Pick lightweight models for lightweight tasks
- Use smaller models for routine edits (example tiers: haiku/sunet/opus).
- Bigger models cost more tokens/thinking; reserve them for complex work.
Turn on planning mode before execution
- Enable a plan-first workflow: 1) the agent reads inputs/specs 2) generates a plan without changing code 3) you approve/adjust 4) only then it executes
- Goal: reduce major mistakes and rework, saving tokens.
Use a planning-focused plugin (“Superpers”)
- A plugin (reviewed elsewhere by the speaker) that uses built-in skills for a structured workflow:
  - brainstorming → planning → implementation → testing
- Recommended mainly for larger tasks; considered overkill for trivial ones.

“Stretching” Usage Limits in Claude (5-Hour Window)

Daily 5-hour limits window

Claude has a daily 5-hour limits window (Pro/Max/Team/Enterprise mentioned).
The window starts from the first message sent.
If you hit limits early, the quota resumes after the 5-hour window ends.

Workaround: Claude Routines

Create a routine that sends a small trigger message (e.g., at 6:00 AM) so the 5-hour window resets and you extend working time with fewer interruptions.

Status update on limits (recent change described)

On May 6, 2026, Anthropic reportedly expanded infrastructure (Memphis data center access) and doubled the 5-hour limits for Claude products (Pro/Max/Team/Enterprise).
Also removed peak-hour restrictions and raised rate limits for Opus models, making it harder to hit limits.

Speaker’s Auxiliary Resources / Recommendations

The speaker mentions creating full guides for:
- Claude Code
- Codex
Promotes their Telegram channel for additional tips, including legal/web development info and recurring AI news/podcast releases.
Mentions they may add missing items to the Telegram channel.

Main Speakers / Sources

Primary speaker: the video’s narrator/author (unnamed in subtitles).
Referenced sources: unnamed developer(s) who measured token usage in long chats (example claim: ~98.5% tokens spent on processing prior responses).
Company referenced: Anthropic (Claude).
Tools/products referenced: Claude, Claude Code, Codex, Cursor, plus MCPs, Agents/Subagents, Skills, Claude Routines.

Share this summary

Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Summarize another video

Summary of "Как НИКОГДА не упираться в лимиты | Claude и Codex"

Main Topic

Key Technological Concepts Explained

Tokens

Context window / memory in a chat

Context degradation

Token cost asymmetry

Practical “Life Hacks” and Product Features (How-To)

“Stretching” Usage Limits in Claude (5-Hour Window)

Daily 5-hour limits window

Workaround: Claude Routines

Status update on limits (recent change described)

Speaker’s Auxiliary Resources / Recommendations

Main Speakers / Sources

Category

Share this summary

Is the summary off?

Video

Summary of "Как НИКОГДА не упираться в лимиты | Claude и Codex"

Main Topic

Key Technological Concepts Explained

Tokens

Context window / memory in a chat

Context degradation

Token cost asymmetry

Practical “Life Hacks” and Product Features (How-To)

“Stretching” Usage Limits in Claude (5-Hour Window)

Daily 5-hour limits window

Workaround: Claude Routines

Status update on limits (recent change described)

Speaker’s Auxiliary Resources / Recommendations

Main Speakers / Sources

Category ?

Share this summary

Is the summary off?

Video

Category