HomeArticle

It took burning through $500 million in a month to wake up: Treating tokens as KPIs is the costliest pitfall in AI transformation. Amazon removed the leaderboard overnight.

AI前线2026-06-02 18:42
The "Token-Maxing" narrative is turning the growth myth of the upstream into a cost disaster for the downstream.

Recently, more and more companies have found that before AI can truly transform their businesses, the token bills have already transformed them first. The most expensive pit in the AI transformation is treating token usage as an employee's KPI.

The boss of a certain company waved his hand and granted full - staff access to Claude, but forgot to set a limit. In one month, it burned through $5 billion, equivalent to more than 3 billion RMB. By the time the finance department noticed, a bill worth hundreds of millions of dollars was already on the way.

How was the $5 billion burned? Digging into the details, it was found that a large portion of it was due to some employees repeatedly hitting errors while running tasks and manually clicking "Retry" over and over again.

If it were just one company making a slip - up, it would be bad luck. The problem is, similar "accidents" are everywhere.

Someone inside Meta created a list called Claudeonomics, which counts who uses AI the most. In 30 days, the whole company burned through more than 60 trillion tokens. Just the "top spender" alone accounted for 281 billion tokens, nearly $500,000 per month.

Uber initially equipped 5,000 engineers with Claude Code. The usage rate soared within a few months. As a result, just after the first quarter, the CTO complained bitterly: The entire AI programming budget for 2026 has been burned through ahead of schedule.

Domestic companies are no exception. At the Alibaba Cloud Summit, the technical leader of Mihoyo mentioned that an employee built dozens of agents for a project and burned through 2 million RMB worth of tokens in one night.

Why do token bills get out of control?

A report titled "Decoding the Agentic Economy" by Goldman Sachs in May this year revealed the truth: In the Agentic mode, since the model needs to continuously cycle through "thinking - retrieving - using tools - re - reading the full context", its token consumption is 1,000 times that of the ordinary Q&A mode.

You think it's thinking, but in fact, it's burning money. Model companies, cloud providers, and chip companies tacitly package "using more AI and burning more tokens" as progress in advanced productivity.

Looking at Anthropic's wealth - creating spree with quarterly revenues exceeding tens of billions and a valuation exceeding one trillion dollars, and Huang's soaring growth curve, isn't it easier to understand? The money "accidentally" burned by downstream companies becomes real revenue on the model companies' financial reports. The same money, two perspectives: one is called growth, and the other is called an accident.

Amazon takes the lead

Burning through hundreds of millions of dollars may be an extreme case, but the phenomenon of burning money for the sake of burning money has long been a common problem among tech giants.

Finally, Amazon couldn't stand it anymore and took the first step.

There was a list called KiroRank inside Amazon, posted on its Kiro developer platform, ranking engineers by their token consumption.

Coupled with the company's previous strict requirement that more than 80% of employees must use AI every week, once the list was out, employees started "token - maxxing": Sending agents to do unnecessary tasks, burning tokens to improve their rankings, and using the rankings as a sense of job security.

Finally, Dave Treadwell, the senior vice - president of Amazon, couldn't sit still and emphasized at an internal meeting: Don't use AI just for the sake of using it.

The list was immediately taken offline, and the new metric became "normalised deployments", which measures whether engineers have used AI to deliver truly useful code, rather than simply counting how many tokens they've burned.

As the cloud - computing giant, Amazon of course believes in AI, but it has also proven one thing: The AI usage metric is too easily corrupted.

If you count tokens, employees will inflate token usage; if you count the number of prompts, employees will fabricate prompts; if you imply that "not using AI means falling behind", employees will find ways to prove they're not lagging behind.

There's a Goodhart's Law in economics: When a metric becomes a target, it's no longer a good metric.

In the AI era, this can be rephrased as: When tokens become a KPI, they're no longer a productivity metric but an automatically inflating cloud bill. In the past, it was about competing on working hours; now, it's about competing on token usage. In the past, KPIs were inflated; now, cloud bills are inflated. Although technology has advanced, the nature of the workplace remains the same.

Amazon isn't the only one hitting the brakes.

Shopify has changed its token ranking list to a more neutral usage dashboard and added a circuit - breaker mechanism. Duolingo once wanted to include AI usage in performance evaluations but later withdrew the plan. Microsoft has also started to reduce the authorization of some external AI programming tools.

Your cost: Huang's asset

The money that has been burned doesn't disappear into thin air. One company's loss is often another company's gain on the balance sheet.

Take Anthropic. Its revenue in the first quarter was $4.8 billion, and it's expected to double to $10.9 billion in the second quarter. What's driving this growth? It's not ordinary users chatting, but corporate APIs, Claude Code, and the organizational impulse of countless companies to "fully embrace AI" and "let agents run first".

Then look at NVIDIA, the "shovel - seller". Its revenue in the latest quarter was $81.6 billion. This isn't just about belief in AI; it's real cash flow.

Now, when you look at Huang Renxun's words, they take on a different meaning. He said at GTC Taipei yesterday that from an industrial perspective, tokens have become a unit of asset and revenue. The clever part of this statement is that it quietly changes the subject.

Token consumption, on the books of ordinary enterprises, is clearly a cost, an expense that the finance department will question about its value.

Only on the books of upstream manufacturers are tokens truly assets: The more tokens, the more inferences; the more inferences, the more Huang's GPUs, networks, and liquid - cooling systems will be sold. On Anthropic's books, they're also assets:Every additional round of agent operation by an enterprise will eventually turn into its revenue and profit margin.

But on the books of downstream enterprises, tokens are first and foremost a cost. Costs aren't something that can't be spent, but only when they can bring about shorter processes, less rework, and stronger delivery can they be called assets. If tokens are just burned for the sake of rankings and a show of advancement, it's just a more expensive form of formalism.

The battlefield for AI efficiency improvement isn't on the token ranking list.

Of course, this doesn't mean going back to conservatism, otherwise, the value of investing in AI would be lost. The problem is, many companies have a shallow understanding of AI implementation.

They think that giving employees accounts is AI transformation, that an increase in usage rate means organizational progress, and that burning more tokens means deeper AI usage.

The COO of Uber, who was among the first to embark on this "transformation", shared his realization after being radical: We did deliver more code, but it's hard to equate it with "creating more useful features for users".

This isn't an isolated case. The code - analysis company GitClear analyzed 220 million lines of code and found that after using AI assistance, the code that needed to be reworked within two weeks after completion increased by 9 times, and the duplicated code copied and pasted increased by 8 times. In many cases, enterprises just replace human inefficiencies with more expensive inefficiencies of the model.

True AI - enabled organizational efficiency improvement doesn't happen on the token ranking list but in the depth of business operations. The most difficult part of AI implementation in enterprises isn't giving employees accounts but making the model penetrate the workflow.

This is why OpenAI and Anthropic are now recruiting Forward - Deployed Engineers with high salaries, to go deep into customers' internal operations to break down processes, manage permissions, and make integrations - because they've realized that mere API - level delivery can't be implemented.

Upstream companies will of course continue to tell the token story, but that's someone else's growth story. If ordinary enterprises don't clarify their own business problems and process structures first and rush to incorporate others' stories into their KPIs, they'll end up as supporting roles in others' financial reports.

Your accident is someone else's revenue. Being able to burn tokens isn't a skill; being able to explain "how this money has improved the organization" is.

This article is from the WeChat official account "AI Frontline" (ID: ai - front), written by Qing He, and published by 36Kr with authorization.