Matthew Hutchings | Senior Technical Consultant & AI Automation Specialist

On 1st June, GitHub switched Copilot from a flat subscription to token-based billing. Within days, developers were posting screenshots of bills that had jumped from $29 to $750. Some went from $50 to $3,000. The numbers felt absurd. They were not.

This was not a bug. It was not an accident. It was the market telling you something it has been working up to for two years.

What Actually Changed

The old Copilot Individual plan was $10 per month. Flat. Use it for ten minutes or ten hours, the bill was the same. The new model bills by tokens consumed: input tokens sent to the model, output tokens returned, and in the case of agents, the repeated round-trips those agents make in the background while you are doing something else.

A developer writing code manually, reviewing suggestions, accepting or rejecting completions, spends tokens at a rate that stays roughly predictable. A developer who turned on Copilot Workspace, spun up an agent to refactor a large codebase, and walked away from the laptop for an hour, does not.

The developers posting four-figure bills were not power users in any meaningful sense. They were users who had automated their AI usage and then stopped watching it.

The Buffet Is Closing

For two years, AI tools have been sold on a gym membership model. One price. Use whatever you want. This kept adoption numbers high and gave providers the growth story they needed to justify their valuations. Behind the numbers, the economics were never sustainable at the prices being charged.

OpenAI, Anthropic, Google, GitHub, Cursor. Every major AI tool provider has been subsidising your usage. That is now ending, tool by tool, quarter by quarter. GitHub Copilot did not invent token-based billing. It just became the one big enough and visible enough that developers actually noticed.

Cursor already has usage tiers. ChatGPT Plus caps requests at certain model tiers. Claude Pro has message limits. These are all versions of the same pressure. The next eighteen months will see every AI tool you have bookmarked move toward consumption-based pricing, or introduce hard caps that make the distinction academic.

The Real Cost Was Always There

Here is what makes the Copilot situation interesting from an architecture standpoint. The token costs that developers are now seeing on their bills were always real. GitHub was just absorbing them.

When a Copilot agent loops through a large codebase, reads file after file, generates a plan, rewrites components, and re-checks its own output, it is making dozens or hundreds of model calls. Each call processes context. Context has a cost per token.

When I build AI automation workflows for clients, one of the first conversations we have is about context window design. Not out of academic interest in efficiency, but because context costs money, and inefficient context design is one of the fastest ways to turn a useful automation into an uneconomical one. That conversation is now going to happen at every company that uses AI tools, not just the ones building them.

The developers who were blindsided in June had never had to think about this. Their employer or their wallet was insulated by a flat fee. The insulation is gone.

What This Means for Businesses Using AI Tools

If you are a business owner or technical lead watching this from the sidelines, the lesson is not to avoid AI tools. It is to approach them differently from how most companies have so far.

The typical adoption pattern has been: sign up, experiment, let the team use it freely, hope that productivity improves, never audit what was actually spent or earned. That pattern worked when costs were fixed. It will not work when costs scale with usage.

Every AI chatbot or agent your business runs has a token consumption profile. A customer service agent that handles 500 conversations per day, each with a long context window and three model calls per message, costs materially more to run than one designed to answer quickly with tight, focused prompts. If your business built that agent when API pricing was lower, or when a flat subscription was absorbing the cost, your margin calculation may now be wrong.

The same applies to workflow orchestration. Automations that chain multiple AI steps together, passing large payloads between them, are significantly more expensive than well-designed pipelines that pass only what the model needs. We built a document extraction workflow for a client last year and cut their per-document AI cost by 60% simply by trimming the context before each model call. The logic was identical. The spend was not.

Where Discipline Wins

The businesses that come out ahead in the token-billing era are not the ones using AI the most. They are the ones who understand what each model call costs them and what it returns.

That means auditing your current AI usage before any new tools go live. Which workflows are running? How often? How many tokens per run? What is the business output? A workflow that saves a team member two hours per week at $0.15 in API costs is a strong return. One that runs hourly, processes large documents, and saves the team ten minutes per week is not.

It also means building AI systems with cost visibility from the start. Token counts, cost per run, and cost per output should be logged and reviewable, not invisible. If the team running an AI automation cannot tell you what it costs to run, they cannot tell you whether it is working.

This is not a pessimistic case against AI. The tools are genuinely useful and the use cases are real. But a tool that costs more than it saves is not a competitive advantage. It is a liability dressed up in impressive demos.

The Meter Is Now Running

GitHub Copilot was a wake-up call for individual developers. The equivalent moment for businesses is coming. Some providers will raise prices quietly. Others will restructure tiers. A few will introduce hard caps that make token-based billing feel nostalgic. The direction is the same regardless of the mechanism.

The developers who adapted quickly to the Copilot change were the ones who already understood how the underlying models work. They knew which features burn tokens fast, which settings to adjust, and where the value actually sits. They optimised.

The ones who got hurt were treating AI as a vending machine: put a flat fee in, take outputs out, never think about what happens in between.

The flat-fee era is ending. Knowing what the meter reads is now part of the job.