HomeArticle

Microsoft quietly phased out Claude Code, exposing the true cost of enterprise-grade AI

神译局2026-06-24 07:06
Inside the world's largest software company, an AI programming experiment may be coming to an end. The reason has nothing to do with strategy—it's all about the bill.

The Shenyi Bureau is a translation team under 36Kr, focusing on technology, business, the workplace, life and other fields, and mainly introducing new technologies, new ideas and new trends from abroad.

In December last year, Microsoft informed thousands of its engineers, product managers and designers that they could use Claude Code, a command-line programming agent developed by Anthropic, at the company's expense.

By spring, the influence of this tool had far exceeded the engineering department and penetrated into various non-technical positions. In previous enterprise software waves, these positions might have to wait for several years to gain access. Inside Microsoft, this promotion was defined as a learning practice; while in the eyes of the outside world, the signal it sent was more straightforward.

The world's largest software company - a giant with its own base model and programming assistant - actually paid out of its own pocket to buy a competitor's product for its employees to use.

However, six months later, this experiment is being gradually phased out. Following an exclusive scoop by The Verge, according to reports from Windows Central and other media, Microsoft is canceling most of the direct connections to Claude Code within its "Experience and Devices" division (the division responsible for building Windows, Microsoft 365, Outlook, Teams and Surface).

Affected engineers have been told to migrate to GitHub Copilot CLI by June 30 (the last day of Microsoft's fiscal year). The official reason given is to unify the toolchain, while the real reason that goes without saying is written on the financial calendar.

This large-scale deactivation of Claude has sent the clearest signal to date: at the current Token (word/character) price, the single economic model of enterprise-level AI programming simply doesn't work. This is not because the tools are not useful. On the contrary, it is precisely because they are so useful that engineers use them frequently, and this high-frequency use has ultimately broken the financial calculation.

The most obvious evidence is at Uber. Unlike Microsoft, it doesn't have such a strong financial cushion. Uber's Chief Technology Officer, Praveen Neppalli Naga, revealed to The Information in April that the company had exhausted its planned AI programming budget for the whole year of 2026 in just four months.

Naga's data shows that as of March, among the company's approximately 5,000 engineers, the usage rate of Claude Code had soared from 32% to 84%. Some individual engineers' monthly Token expenses were between $500 and $2,000. Currently, about 70% of the code submitted by Uber comes from AI, and in the backend updates of the production environment, about one-tenth is directly released by AI agents without any human intervention.

"I have to start all over again," Naga said. "Because the budget I thought would be enough was quickly squandered."

This statement is a microcosm of the current situation in the entire industry. The prediction was inaccurate because the "Token consumption", as a predictive variable, behaves completely differently from the "software license" or "account seat" models familiar to the finance team. Traditional enterprise software transactions are priced based on the number of users.

Transactions based on Token pricing are priced according to how much content the model needs to "think". Agent-based programming requires the model to do a lot of thinking. A single session often runs for several hours, spawns multiple parallel threads, and generates a huge amount of context. This is a far cry from the "code autocompletion" interaction on which the initial pricing mechanism was based.

For months, we have been closely monitoring the breakdown of this model. In November last year, GitHub suspended new user registrations for Copilot Pro and Pro+ because the costs generated by the agent-based workloads of paying users had exceeded the fixed monthly package prices they paid.

The company had to admit that the cost structure previously built for lightweight assistance was no longer sustainable.

This is not just a problem faced by Uber or Microsoft, but the current situation in the entire industry. Bryan Catanzaro, vice president of applied deep learning at Nvidia, told Axios in April that for his team, the current computing power cost has far exceeded the human cost of hiring these employees.

Even the chip giant itself has said so. Then in May, Fortune magazine also reported that under high-intensity use, the cost generated by a Token-based AI tool for a single task could even exceed that of the human engineer it was supposed to assist.

After that, an analysis report from the Massachusetts Institute of Technology (MIT) in 2024, widely circulated in the financial circle, pointed out that at the current pricing, among all the job positions that people think will be replaced by AI, only about a quarter of them can actually have lower costs after being automated by AI than by human labor.

Compare this reality with the expenditure forecast: Gartner predicts that global AI spending will reach $2.5 trillion this year, a surge of 69% compared to 2025.

At the same time, this consulting firm has now classified generative AI into the so-called "trough of disillusionment". In a press release issued in May, it predicted that due to a large number of proof-of-concept (PoC) projects dying in the procurement process, 25% of the planned AI budget for 2026 will be postponed to 2027.

Another survey by Gartner in April also found that only 28% of AI infrastructure projects could fully meet the expectations in their business plans. This is no longer a pain curve for a technology going through an embarrassing "adolescence", but a curve for the entire market to reprice.

Microsoft's withdrawal is in the middle of this repricing wave, and it is by no means accidental. There are two ways to interpret this move. The first is the official line from Microsoft: Copilot CLI is the end point of the company's strategy. Engineers can still call the Claude model within Copilot in the future. The company just wants to have a product that can be directly controlled and shaped through GitHub. This statement is true.

But this reason could have been given by Microsoft at any time in the past six months, but they didn't. What has changed now is not the strategic logic, but the bill.

The second interpretation is more persuasive and hard to ignore. Microsoft has a unique perspective and knows better than anyone how high the cost of using Claude at an enterprise scale is, because apart from Anthropic's own customer base, Microsoft's engineers are the most core heavy users. According to multiple sources, within the "Experience and Devices" division, Claude Code has become the most popular tool.

If the cost could be spread out as the scale expands and make the accounts look good, this should be a great opportunity for Microsoft to finalize a multi-year contract with preferential terms. However, on the contrary, they chose to put the brakes on this experiment during this time window, just in time to settle the accounts at the end of the fiscal year.

When the most powerful giant at the negotiation table decides to abandon a supplier that even its own employees prefer, the signal it sends has nothing to do with "preference".

Whether this means the existence of a bubble depends on how you define it. The unit price at the Token level will indeed decline. In the past three years, it has approximately decreased to one-tenth of its original value every 18 months. But the more interesting question is: can the reduction speed of the Token consumption for a single task catch up with the decline speed of the cost of a single Token?

The current evidence points in the opposite direction. By design, the new generation of agent systems will consume more Tokens when processing a unit of work because it has a longer inference time, more meticulous planning, and needs to constantly compare the results with the outside world.

Anthropic's own infrastructure team has publicly stated that the computing power consumption generated by the inference workload in a single query is several orders of magnitude higher than that of traditional chat conversations. This is also the bet placed on the new models to be launched in the next 12 months. However, it is this bet that has forced Uber's CTO to start all over again.

There is a vivid example in our previous report. In April, Anthropic banned a popular open-source agent framework called OpenClaw from running on consumer-level Claude subscription accounts. Because they found that within a day of autonomous operation, a single instance of this framework could consume API costs equivalent to $1,000 to $5,000. At that time, this framework was running on a Max package that only cost $200 per month.

This kind of blatant exploitation was so obvious that Anthropic had to add a special restriction to its terms of service. If this consumption model is scaled up to the entire engineering team of a Fortune 500 company, it will turn into Uber's over-budget note.

The opposite view is also realistic and worth mentioning. If you compare the cost of a useful AI programming agent with the cost of hiring an additional senior engineer, even at the current price, it is usually still cost-effective when billed by function development. The improvement in productivity is well-documented, and the replacement is already happening. The problem is not the "value proposition" of the product itself.

The problem lies in the "procurement model". Enterprises that thought they had bought a productivity tool suddenly found that they had actually signed a pay-as-you-go utility bill, and as long as no one was watching, the meter would spin wildly. The solution may be simple: set a budget cap for each engineer, open hierarchical access rights for high-leverage core positions, or limit the operation quota of agents.

Many large buyers have already started doing this. But what this conveys is that the era of "equipping each employee with a Claude Code account" is coming to an end. The model that will replace it in the future will be more like the pay-as-you-go cloud computing model of AWS, rather than the fixed software license model of Office.

This is what the low-key email sent by Microsoft to its Windows and Surface teams really announces. It doesn't mean the end of AI programming, nor does it mean the end of the cooperation between Anthropic and Microsoft - after all, the Claude model can still be called through Copilot CLI.

It announces the end of the exploration phase. In this phase, the world's largest software giants were once willing to pay the Token cost regardless of the price, just to learn and figure things out. Now, this course is over.

What follows is the real battle. Enterprises will continue to buy AI programming tools because the improvement in productivity is real, and the intense competitive pressure does not allow them to back down. But in the future, they will buy AI like they buy electricity: there will be a usage limit, a secretly monitored meter, and the finance team must be present when making decisions.

At some point this spring, in a meeting room at Microsoft, someone stared at the Claude Code invoice, calculated based on the product roadmap of Copilot CLI, and then made a decision.

And the same calculation process is now taking place in the offices of every CFO of enterprises that participated in the promotion wave in December 2025. This retreat will not be high-profile. It will only turn into emails sent just before the end of the fiscal year. Before this deadline arrives, no one notices that the budget has already run out.

Translator: boxi.