Summary of "I exploited Copilot and burned $46,000 (it cost $40)"

Overview

This video is a commentary on GitHub Copilot’s evolution from simple autocomplete to “agentic” coding—and how that shift strained (or broke) the economics behind Copilot-style subscription billing.


Main arguments / reports

1) Copilot’s billing change is causing outrage

The creator says Microsoft/GitHub recently announced Copilot will shift from fixed limits like “messages per month” to rate limiting based on credits/tokens—described as token consumption and API rates for each model.

Users are angry, and some claim the change proves Microsoft can’t afford “subsidies.”


2) The outrage is misinformed—billing was exploitable

The creator argues that users and workflows could burn far more inference cost than what the plan price seemed to imply.

He claims this becomes more severe as models behave more like agents—performing many tool calls/steps per user “message.”


3) “Revenge” via experiment: burning Copilot spend

He describes running extremely expensive inference workloads while only using a small fraction of his Copilot Plus quota (the $40/month tier).

He provides a rough estimate that with about 5% of his monthly messages, the inference cost to Microsoft could be $550+, and he claims he plans to push the total to $40,000 by abusing remaining loopholes.


4) Message-based billing is fundamentally flawed in agentic systems

He outlines several inference billing approaches:

  1. Subscriptions with rate limits (vague multipliers/black boxes)
  2. Subscriptions with message limits
  3. Subscriptions with spend limits (more transparent dollar budgets)
  4. API billing per token
  5. (Also discussed) Dedicated compute (enterprise-style reserved GPU capacity)

Key thesis: as models become agentic and do more internal steps per interaction, the cost of a “message” becomes highly variable. This makes message quotas easier to game, while the provider faces unpredictable GPU costs.


5) He contrasts his own platform’s billing with Copilot’s

He says his company previously used message limits for subscriptions and learned that “messages don’t map cleanly to cost.”

He describes how expensive model tiers (e.g., “Sonnet-like” tiers) were disproportionately costly, and how a small number of users could cause very high inference spend—leading him to change quotas and/or billing later, implying Copilot also waited too long.


6) A dramatic example: “one message” that runs for hours

He shares a test from GPT-5.4 Extra High: a single prompt produced approximately:

He estimates raw API cost could be about $163 without caching, and still around $62 with caching—turning “one message” into potentially massive spend.


7) “Cryptography puzzle” as an abuse workload

He claims he designs cryptography puzzles intended to be hard for models, forcing them to run longer.

He then automated Copilot sessions to repeatedly solve/iterate until it hit a “planned completion pattern,” using constraints such as:


8) Caching lowers costs but doesn’t eliminate the exploit

He argues that caching makes token costs cheaper (he references something like “10x cheaper cached input”), but the total cost can still be high enough that quotas can be exploited.


9) June 1st billing overhaul is framed as a necessary fix

He insists this is not a “rug pull” or an attempt to extract more money. Instead, he says providers are switching because they lack enough GPU compute to subsidize massively exploitable workloads indefinitely.

He also argues that enterprises paying for large compute shouldn’t be deprived when consumer subscriptions allow runaway costs.


10) He blames loophole tools (but condemns abusive behavior)

He repeatedly criticizes tools/workflows that compress whole codebases into formats that dramatically inflate inputs and encourage expensive runs (specifically calling out a “Repo Mix”-style tool).

His stance is that abuse is “cringe,” harms small businesses, and crosses lines.


Overall stance / conclusion

The creator presents the billing change as an overdue correction for an exploit that worsened once agentic workflows allowed far more computation per interaction.

He positions himself as both:

He argues the new token/credit-based approach is meant to prevent scenarios where consumer billing doesn’t cover the real GPU usage.


Presenters / contributors

Category ?

News and Commentary


Share this summary


Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Video