Summary of "The AI Bubble Is Getting Worse Faster Than Expected..."

Overview

Topic: The AI “bubble” accelerating due to unprofitable, agent-driven usage of large language models (LLMs) and the escalating compute costs that power them.

Key thesis: Agentic workflows (LLMs paired with harnesses/agents that can act on systems) multiply compute usage per user and can cause runaway costs, forcing providers to tighten policies or change billing. This will correct growth expectations and business models, though it won’t eliminate AI.

Risk highlight: leaked API tokens or third‑party agents tied to a billing account can cause runaway billing because agentic usage is extremely compute‑intensive.

Key incidents and policy changes

Anthropic / Claude
- Recent source-code leak reported.
- Policy tightening to restrict or charge extra for third‑party “harnesses”/agent connections that use customers’ API tokens, aimed at preventing unpaid heavy usage.
The policy change underscores the financial risk when third‑party agents or leaked tokens are used to drive heavy, unmetered compute on cloud LLMs.

Technical concepts

Agentic AI / harnesses
- An LLM (“brain”) attached to a harness/agent (“body”) that has system-level access: run commands, browse, download files, manage local apps, etc.
- Examples: Hermes agent, “OpenClaw”-style agents.
Tokens & prompts
- Agent prompts can be hundreds to thousands of tokens long; each token contributes to cloud LLM billing, making sessions expensive.
Local vs cloud LLMs
- Running models locally (OpenWebUI, local weights) avoids cloud telemetry and per‑use billing and offers better privacy, but requires heavy local hardware and technical skill.

Demonstrations and practical points (what was shown)

The creator (Mudahar) demonstrated running an agent locally:

Setup
- Hermes agent paired with an open‑source model (Quen 3.527B) running locally.
Tasks performed by the agent
- Searched the web and downloaded 4K Cyberpunk wallpapers to a desktop folder.
- Researched and applied fixes to OpenWebUI following best practices.
- Performed maintenance tasks: updated containers, checked for malware, uploaded files, and responded to remote messaging requests.
Practical takeaway: local agents can perform a wide range of tasks, but they consume significant local resources.

Hardware metrics observed

Example system: 128 GB RAM, Ryzen 5950X, RTX 4090.
Resource usage while running LLMs:
- RTX 4090: VRAM usage around ~21.8 GB for the loaded context; GPU fans ramp up significantly.
- CPU: spikes to 100% under load.
- Overall load comparable to modern AAA gaming — heavy but feasible on high‑end consumer hardware.

Open‑source models and tooling mentioned

Models and interfaces
- Quen 3.5 / Quen 3.527B (open‑source Chinese model)
- Gemma 4 (Google’s open‑source model, intended for mobile/edge)
- OpenWebUI (local UI)
Frameworks and agents
- Hermes (agent)
- “OpenClaw”-style harnesses
- NVIDIA NeMo / NeMo agents
Note: These are viable alternatives to cloud LLMs for privacy and cost control, but they remain compute‑heavy.

Economic and industry analysis

Agentic workflows dramatically increase per‑user compute cost compared to plain prompt-based usage (searches or short prompts).
Many consumer plans implicitly subsidize heavy usage (e.g., “$20/mo” plans), and agent-driven usage exposes those subsidies as unsustainable.
Hobbyists and “AI bros” running many agents are driving up provider costs; companies that previously subsidized heavy usage are now forced to restrict access or change billing.
Corporate pressure: investors and CEOs face margin compression. Claims that agents will replace human workers are premature when many agent runs fail yet still consume compute.
Conclusion: the bubble will adjust as costs and business models are reassessed; this will temper unrealistic expectations but not remove AI’s long‑term relevance.

Security and privacy concerns

Full‑access agents are risky:
- System‑level access can leak sensitive data or API tokens.
- Agents may perform unexpected actions if not properly sandboxed.
Cloud LLMs:
- Sending data to third parties raises telemetry and privacy concerns.
- Leaked API tokens tied to cloud billing can cause runaway charges.
Local LLMs:
- Preserve privacy and reduce third‑party exposure.
- Shift cost and responsibility to the user (hardware, maintenance, security).

Guidance and recommendations

Be cautious with API tokens and third‑party harnesses:
- Rotate and limit tokens, monitor usage and billing alerts.
- Avoid embedding long‑lived credentials in public or semi‑public agent setups.
Consider local open‑source models and agents for privacy and to better understand true computational costs.
Learn agent setup fundamentals and resource requirements before relying on cloud subscriptions for mission‑critical or high‑volume workflows.
Monitor provider policy changes and pricing updates — agentic usage is increasingly scrutinized and may incur new fees.

Noted product names and technologies referenced

Anthropic / Claude (cloud LLM, Claude Code)
Hermes (agent)
OpenClaw / “OpenClaw”-style harnesses
Quen 3.5 / Quen 3.527B
OpenWebUI
RTX 4090, Ryzen 5950X (hardware examples)
Gemma 4 (Google)
NVIDIA NeMo / NeMo agents
OpenAI (and hiring activity related to agent tooling)
Media coverage example: Wall Street Journal

Main speaker and sources

Speaker: Mudahar (YouTuber, “Me Mudahar”)
Companies / projects cited: Anthropic (Claude), Hermes, Quen, OpenWebUI, Gemma 4 (Google), NVIDIA (NeMo), OpenAI, and “OpenClaw”-style agents.