Summary of "I exploited Copilot and burned $46,000 (it cost $40)"
Overview
This video is a commentary on GitHub Copilot’s evolution from simple autocomplete to “agentic” coding—and how that shift strained (or broke) the economics behind Copilot-style subscription billing.
Main arguments / reports
1) Copilot’s billing change is causing outrage
The creator says Microsoft/GitHub recently announced Copilot will shift from fixed limits like “messages per month” to rate limiting based on credits/tokens—described as token consumption and API rates for each model.
Users are angry, and some claim the change proves Microsoft can’t afford “subsidies.”
2) The outrage is misinformed—billing was exploitable
The creator argues that users and workflows could burn far more inference cost than what the plan price seemed to imply.
He claims this becomes more severe as models behave more like agents—performing many tool calls/steps per user “message.”
3) “Revenge” via experiment: burning Copilot spend
He describes running extremely expensive inference workloads while only using a small fraction of his Copilot Plus quota (the $40/month tier).
He provides a rough estimate that with about 5% of his monthly messages, the inference cost to Microsoft could be $550+, and he claims he plans to push the total to $40,000 by abusing remaining loopholes.
4) Message-based billing is fundamentally flawed in agentic systems
He outlines several inference billing approaches:
- Subscriptions with rate limits (vague multipliers/black boxes)
- Subscriptions with message limits
- Subscriptions with spend limits (more transparent dollar budgets)
- API billing per token
- (Also discussed) Dedicated compute (enterprise-style reserved GPU capacity)
Key thesis: as models become agentic and do more internal steps per interaction, the cost of a “message” becomes highly variable. This makes message quotas easier to game, while the provider faces unpredictable GPU costs.
5) He contrasts his own platform’s billing with Copilot’s
He says his company previously used message limits for subscriptions and learned that “messages don’t map cleanly to cost.”
He describes how expensive model tiers (e.g., “Sonnet-like” tiers) were disproportionately costly, and how a small number of users could cause very high inference spend—leading him to change quotas and/or billing later, implying Copilot also waited too long.
6) A dramatic example: “one message” that runs for hours
He shares a test from GPT-5.4 Extra High: a single prompt produced approximately:
- ~111M input tokens
- ~1.6M output tokens
- running for 16 hours
He estimates raw API cost could be about $163 without caching, and still around $62 with caching—turning “one message” into potentially massive spend.
7) “Cryptography puzzle” as an abuse workload
He claims he designs cryptography puzzles intended to be hard for models, forcing them to run longer.
He then automated Copilot sessions to repeatedly solve/iterate until it hit a “planned completion pattern,” using constraints such as:
- don’t access other files/computers/social media
- “keep going until plain text answer”
- even modifying the prompt to make the task “unsolvable,” forcing long-running behavior instead of quick completion
8) Caching lowers costs but doesn’t eliminate the exploit
He argues that caching makes token costs cheaper (he references something like “10x cheaper cached input”), but the total cost can still be high enough that quotas can be exploited.
9) June 1st billing overhaul is framed as a necessary fix
He insists this is not a “rug pull” or an attempt to extract more money. Instead, he says providers are switching because they lack enough GPU compute to subsidize massively exploitable workloads indefinitely.
He also argues that enterprises paying for large compute shouldn’t be deprived when consumer subscriptions allow runaway costs.
10) He blames loophole tools (but condemns abusive behavior)
He repeatedly criticizes tools/workflows that compress whole codebases into formats that dramatically inflate inputs and encourage expensive runs (specifically calling out a “Repo Mix”-style tool).
His stance is that abuse is “cringe,” harms small businesses, and crosses lines.
Overall stance / conclusion
The creator presents the billing change as an overdue correction for an exploit that worsened once agentic workflows allowed far more computation per interaction.
He positions himself as both:
- a “tester” measuring actual cost, and
- a spiteful user attempting to demonstrate the mismatch between what people pay (e.g., $40) and what compute can be consumed.
He argues the new token/credit-based approach is meant to prevent scenarios where consumer billing doesn’t cover the real GPU usage.
Presenters / contributors
- Solo presenter: The video appears to be narrated by the creator (unnamed in the subtitles). He references prior work involving Azure benchmarking, runs a subscription platform called T3 Chat, and conducts tests involving Copilot.
Category
News and Commentary
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.