GitHub changed the deal. Here is how we built ours to hold.

We use Claude Code for new development at inmydata, and have done for some time. For our legacy .NET work we bought annual Copilot. Copilot’s tight integration with the Microsoft stack made it the right tool for that codebase. That was the deal we paid for.

On 20 April, GitHub changed the licensing terms significantly. Annual plans will see model multipliers increase. Standard-tier models that were 0x will no longer be free. No new models or features will be added to annual plans going forward. The contract we signed has been materially diluted.

We are consolidating everything onto Claude Code, including the legacy .NET work. We do not yet know if the usage limits on our Claude Max subscriptions hold for the combined new-plus-legacy load. We will find out.

The pattern is industry-wide

This is not a one-off. The day after GitHub’s announcement, Anthropic ran a 2% A/B test that removed Claude Code from the $20 Pro plan for new prosumer signups, and quietly updated the public pricing page to match. The page change was reversed within twenty-four hours after community pushback. The server-side test is still running. Anthropic’s Head of Growth, Amol Avasare, called the page update a mistake but conceded the wider point. Their current plans, he said, were not built for this.

That is the same admission GitHub made, in different words. Long-running, parallelised agentic sessions consume far more compute than the original plan structure was built to support. Two of the largest coding-agent vendors in the Western market admitted in writing within seven days that flat-rate billing had broken under agentic load.

The macro numbers say the same thing. Agentic models consume between five and thirty times more tokens per task than a standard chatbot. Inference is now 85% of the average enterprise AI budget, and the average enterprise AI budget grew from $1.2 million in 2024 to $7 million in 2026. Token-based billing is the direction of travel across the market, and annual prepayment did not protect us from it.

The wider truth

The vendor pricing story is not really the point. It is a visible symptom of something less visible. Agentic AI consumes tokens at a rate that punishes anyone who has not thought carefully about the economics. That applies to the vendors selling coding agents on flat-rate plans, and it applies just as much to anyone, like us, building agentic products on top of frontier models.

We tested our own token economics early, found the cost uncomfortable, and did the engineering work to bring it under control. The pricing news from GitHub is a clean example of what happens when a product ships without a deep understanding of its token economics. It illustrates, in a place readers can see, the same numbers we had already met on our own bill.

Spend Haiku to save Opus

The discipline shows up as a single architectural pattern expressed three ways, at three different layers of our stack. Cheap classification in front of every expensive call.

A coffee buying expert system. The chat-based expert system we built for a coffee buying customer has access to 23 data sources. ERP demand data, demand predictions, crop forecasts for raw coffee, weather forecasts, currency exchange rates, market intelligence. The original architecture used Opus to evaluate the schemas of all 23 datasets, judge what was important, and decide what to extract. The schemas are large. Repeatedly passing them to an expensive model accumulated cost rapidly. We absorbed it intentionally during the early period, because we wanted to see what the real-world economics looked like before we passed the bill on. The architectural fix, which we had learned on Studio, was to use Haiku to assess the relevance of each dataset against the question, even at the column level, then pass a filtered schema to Opus. Token cost dropped to below 40% of the original, a more than 60% reduction, with no degradation in answer quality. The saving compounds in two directions. The schema is smaller per iteration, and there are fewer iterations because Opus has less to be curious about. The configurable iteration ceiling we have written about previously sits on top as a safety net. It is not the load-bearing mechanism. The load-bearing mechanism is filtering the surface the expensive model is allowed to reason over.

inmydata Studio. Our agent-driven dashboard designer hit the same problem first, which is where we learned the pattern. A single Sales Dashboard request was burning over 190,000 input tokens against a projected 8,000. The cost driver was a roughly 50,000-token data schema repeated through conversation history across four to five LLM round-trips, with prompt caching not yet implemented. After progressive schema narrowing, combined with Anthropic prompt caching at the system-prompt and schema layers, dashboard generation came down to around 62,000 tokens. A 68% reduction. The Personal plan went from giving customers two or three dashboards a month to seventy-six.

Agentic-rag. The retrieval pipeline running underneath our knowledge systems puts two cheap Haiku classifiers either side of the expensive Opus call. A pre-retrieval gate decides whether to search, reformulate, or skip before the retriever runs at all. Conversational acknowledgements never trigger retrieval. The post-retrieval relevance judge runs only on borderline scores, dropping topically-adjacent-but-irrelevant chunks before they reach the answer model. Both classifiers fail-safe, defaulting to search or to keep-the-chunk, so reliability does not depend on the cheap model being right. Same pattern, different layer. Spend Haiku to save Opus.

You cannot buy your way out of this

The obvious counterargument is that token prices will keep falling, so the right answer is to wait, or to buy more. Token prices fell 280 times in 18 months. Enterprise AI spending tripled in the same period. Studio’s 24x overrun happened on Haiku, the cheapest frontier-class model on the market. The vendor was not the problem. You cannot buy your way out of an architectural problem that scales with usage faster than price falls.

Doubling the token allowance buys you two months. Architecting the agent so that the bulk of the calls go to a cheap classifier and only the borderline cases reach the expensive model gives you a stable cost structure that survives the next price change in either direction. The bill can move. The architecture absorbs it.

What we owe the customer

Vendors will keep adjusting their pricing as they learn what agentic load really costs. We cannot control that. Today it is GitHub. Tomorrow it could be the other one. Cursor took two cycles to land its model. GitHub is starting that cycle now and will not be done by 1 June.

What we can control is the contract we offer our own customers. Most of what we build has a per-token component. We do not hide that. We talk it through up front, and we put the engineering work into bringing the per-token cost down to a level where the customer sees a clear return. Token discipline is a cost-management technique, and a serious one. It is also what makes that conversation honest.

We have been writing about this for a while. Constraining your agents makes them better. Bounded autonomy turns out to be the architectural shape the market is now pricing for.

The deal we paid for changed. The deal we ship will not.

GitHub changed the deal. Here is how we built ours to hold.

The pattern is industry-wide

The wider truth

Spend Haiku to save Opus

You cannot buy your way out of this

What we owe the customer

More from the blog

Your coding agent just got bought. Build like you might have to leave.

Why constraining your agents makes them better

Opus 4.7 has a new parameter called task_budget. Pay attention to it.

Want to discuss this?