You have 1 article left to read this month before you need to register a free LeadDev.com account.
Estimated reading time: 4 minutes
Key takeaways:
- The AI-coding free lunch is over. Agentic tools burn far more tokens than engineers signed up for, and vendors are responding with price hikes and usage limits.
- Nobody, including the vendors, saw this coming.
- Managers now have to justify the spend, not just celebrate the speed.
AI-coding tools and harnesses have changed the role of software engineering – and fast. Devs have started tokenmaxxing, with some companies, including Meta, launching leaderboards to measure their workers’ ability to burn through AI usage.
Agentic coding tools have radically changed AI usage, acting semi-independently through command lines, files, internet access, and tool calls. This also means they burn far more tokens than older chat-based tools, says Sebastian Baltes, professor of software engineering at the University of Heidelberg. “I’m a daily user of these tools myself, and it’s just astonishing how fast the field moves,” he says.
As software engineering evolves, the suppliers of that AI inference, including the biggest providers within the tech sector, have pulled the rug from beneath coders and the companies they work for.
Price hikes, tighter usage limits, and the shutting down of some providers to new signups has upended the new way of work software engineers were just getting used to.
Your inbox, upgraded.
Receive weekly engineering insights to level up your leadership approach.
Rising prices
The latest example is GitHub Copilot, which has tightened session‑ and weekly‑token limits, moved toward a token‑based “AI Credit” billing model, and nudged users to cheaper models and more efficient workflows for routine tasks.
By introducing multipliers that penalize heavier models and long‑running agentic sessions, it discourages tokenmaxxing while still preserving high‑multiplier tiers for complex reasoning only when needed.
OpenAI has shifted Codex‑backed coding access in ChatGPT Business and Enterprise plans from flat subscriptions to explicit token‑based pricing.
Other vendors such as Windsurf and Cursor have similarly raised their Pro tiers and replaced simple request limits with monthly credit or quota systems, while Claude‑based coding workloads now frequently require Application Programming Interface- (API‑) level billing stacks on top of the base subscription.
So, what’s going on?
Crackdowns and drawbacks
“I think the assumption was that somehow things get more efficient over time, and then also token consumption will be less of a problem,” says Baltes. “Now I think we have the opposite thing happening.”
Eduardo Ariño de la Rubia, professor of practice at the Central European University, argues that vendors aren’t changing prices arbitrarily: the products themselves have changed. Users may feel that they signed up for Claude Code or Codex and are now being squeezed, but the tools now have a far heavier token profile.
“A coding tool in April 2026 is very different from the exact same coding tool – even the exact same brand name – it was in January,” he says.
What was initially a prompt-and-response interaction has become something more complicated as agentic systems have started to crop up, alongside tool calls. This means that the costs for companies providing this inference have shot up. Internal documentation suggests the cost of serving GitHub Copilot customers has nearly doubled since January.
Nobody’s yet settled on a model for usage, pricing, or rate limits, reckons Trisha Gee, a software developer who previously worked for JetBrains and Gradle. What had been a rough draft of how things could work has changed. “People are using the tools in a way we didn’t anticipate,” she says. “They didn’t think developers were going to spin up thousands of agents doing long-running processes.”
More like this
Finding a new path
Still, that’s the new reality we’re in. The onus now is on how engineering managers overseeing teams of developers should react. The options broadly divide into two buckets: either continue to use the tools provided by the leading AI firms and swallow the costs, or choose to run local models.
Miklos Koren, professor of economics at the Central European University and founder of the MicroData research group, believes “local inference is going to become more important.” He adds: “my prediction is going towards more decentralized [Large Language Models] LLMs and local source solutions.”
Yet, while local LLMs are powerful, they’re not a one-to-one replacement for the massive inference speed and Graphics Processing Units (GPUs) that the big tech providers can offer.
For those firms that continue to pay up to the biggest companies, it’s incumbent on managers to have more conscious thinking about what the return on investment is. “Now we’re in the real, ‘prove that it creates business value’ era, not just, ‘you’re crossing tickets off faster today,’” says de la Rubia.
For devs used to all-you-can-eat token budgets, being put back within limits might be a tricky transition, which again is where managers come into their own.
“It’s a good time for senior folks to take a step back and think about what works for them and what doesn’t,” says Gee. “Where humans can really add value is figuring out what works for their people.”