Summary of "The AI bubble is bursting"

Summary — “The AI bubble is bursting”

AI costs are rising and companies are tightening access. The episode argues that AI is becoming less “unlimited” and more metered and expensive, leading to backlash from heavy users and organizations that were relying on large-scale AI usage.
Major pricing change: GitHub Copilot moves from usage-based to token-based pricing.
- Previously, Copilot effectively charged per request—so long chats and agentic workflows stayed relatively affordable even as token counts grew.
- With the new model, pricing is based on tokens (input/output length), so longer messages and growing conversation threads become more expensive per request.
- Copilot also increased pricing for newer models (examples mentioned: “Opus 47/48”-type models), sometimes by large multipliers—making previously affordable workflows dramatically more costly.
Real-world impact shown with bills and reported user reactions.
- The host explains how to pull monthly Copilot usage reports via GitHub settings → AI usage, then preview them using uploaded CSVs.
- A comparison around the pricing switch:
  - March (usage-based): about $440 total.
  - The same usage under token-based pricing would have been nearly $1,800.
- Another month (April) is described as cheaper out-of-pocket, but much worse under token pricing due to model access changes (older model disabled; forced to use a more expensive model tier).
- Reported user concerns include:
  - People hitting monthly budgets quickly after the June 1 switch.
  - “Caching” lasting only briefly, so even small follow-ups sent later can become disproportionately expensive.
  - Business reactions such as Uber allegedly capping employees’ AI spend at ~$1,500/month.
Industry-wide pattern: providers are “clamping down” and monetizing differently.
- The episode links Copilot’s shift to similar moves by other vendors (e.g., Anthropic/Claude and others) to control heavy or “unmetered” use.
- The argument: even if some competitors seem more permissive (notably cited: certain OpenAI/“codeex”-type positioning), most will adopt monetization constraints because cloud inference is expensive and companies want user lock-in/data.
New model strategies: providers ship “cheaper” internal models for tasks.
- Microsoft Build is highlighted: Microsoft announced several models (7 mentioned), including:
  - A fast/cheap “code flash”-type model for smaller developer “grunt work” tasks.
  - A “thinking” model (not yet priced/revealed fully in the discussion).
- The episode suggests a motive: earlier claims that Microsoft limited internal use of Claude, implying Microsoft is building enough capability to rely more on its own models.
Google’s parallel move and timing speculation.
- Google IO is referenced: Gemini 3.5 Flash (fast/cheap) announced, with Gemini 3.5 Pro expected later (June 2026 mentioned).
- Hosts speculate the “Pro” model may sit closer to the expensive “big model” tier—if comparable, it would reshape cost tradeoffs.
Proposed “solution”: shift from agentic automation to cheaper models + more manual coding.
- As models become cheaper, they may become less capable, requiring developers to do more work themselves rather than letting AI fully write/run workflows.
- The “AI bubble” framing suggests the industry is moving toward sustainable economics.
More alternatives: local models, open-weight models, and low-cost cloud inference.
- The episode recommends running smaller local/open-weight models (examples referenced: Gemma 4, Qwen 3.6, Minix-style options, etc.), described as roughly “Sonnet/Haiku-level” for coding tasks.
- Services like OpenRouter are mentioned as a way to run these models via third-party GPUs at very low cost (“pennies”).
- A personal stance from the contributor: paying for plans that include usage is acceptable, but token-based pricing becomes painful unless used for very narrow tasks.
Debate among tools: Cursor Composer praised as a cheaper workflow path.
- One host argues people may be overlooking Cursor Composer, claiming it can outperform some Microsoft offerings in speed/quality while being significantly cheaper—occasionally escalating to higher-tier models when needed.
Hardware improvements are another long-term pressure valve.
- The episode argues the “bubble” isn’t only pricing—hardware efficiency should keep improving and models will get smaller.
- They connect this to major tech cycles (personal computing, smartphones) and predict an AI hardware revolution where local/consumer-grade setups reduce reliance on expensive cloud inference.
- Analogy: companies currently “rent” expensive compute; in the future, costs may shift to a one-time local hardware investment rather than ongoing per-token spending.