GPT-5.5 doubles, Gemini triples: How long can this price hike game last?
How long can the price - hike game of cutting - edge large models last?
Since January this year, the rental price of GPUs has more than tripled.
According to the "Memory Price Tracking Report" released by Counterpoint in February, since the first quarter of 2026, the memory price has increased by 80% - 90% quarter - on - quarter, hitting an unprecedented sharp rise.
Naturally, this price increase has been passed on to the downstream.
The "Gradient Update" report just released by Epoch AI did a straightforward thing: it calculated the number of Tokens that all Blackwell chips in the world can process and then compared it with the actual demand.
The conclusion is just one word - Insufficient.
The Token flood engulfs everything
Let's first look at the supply side.
Epoch AI's model is based on Kimi K2.6 - an MoE architecture with trillions of parameters and 32 billion active parameters.
With an input - output ratio of 8000:1000, the theoretical limit of the global Blackwell cluster is about 20 billion output Tokens per second.
Sounds like a lot? Let's do the math: it's enough for everyone on earth to use 7 million Tokens per month.
But this is an ideal situation. Once the context window is extended to 128k, the throughput drops directly by 50 times, down to about 500 million Tokens per second.
Now let's look at the demand side.
Google has just revealed that it processes about 1.2 billion Tokens (input + output) per second.
Calculated at a request ratio of 8k:1k, the output Tokens per second are about 130 million. Exponential View estimates that Google accounts for about 25% of the global Token demand.
This means that the current global Token demand can barely be met if the entire capacity of the Blackwell cluster is used for expensive trillion - parameter models.
But at what speed is the demand growing?
Ten times a year.
Since 2024, the number of Tokens processed by Google has increased tenfold annually, and the growth rates of other suppliers are similar.
What about the supply side? The global AI computing power increases by 3.4 times a year, and the chip memory bandwidth increases by 4.1 times a year.
The supply grows by 3.4 times while the demand grows by 10 times. The gap is widening every year.
Meta employees burn 1 million Tokens a day
The shortage of computing power is not an abstract number.
Let's see what's going on inside the enterprises.
According to The Information, Meta's 850,000 employees consume 60 trillion Tokens per month.
Calculated, each employee burns about 1 million output Tokens per day.
Apple is even more aggressive.
Some engineering teams are allowed to spend $300 on Tokens every day. According to the price of Kimi K2.6, it's enough for one person to generate 25 million output Tokens a day.
These are just two companies.
About 14 million software engineers around the world are using AI every day.
If their usage intensity reaches the level of Meta or Apple employees, the global Token throughput demand will soar to 200 million to 4 billion Tokens per second.
4 billion.
While the limit of Blackwell's long - context is 500 million. There is a full order of magnitude difference.
Claude Code slows down developers by 19%
More embarrassing things are happening.
The latest research from METR shows that in actual tests, Claude Code slows down senior developers' completion speed by 19%.
The installation growth rate of VS Code - related plugins has flattened significantly since the beginning of the year.
The slowdown in the growth of coding tools may be due to two reasons: one is the tightening of computing power resources, and the other is that many enterprises have used up their annual AI budgets.
In sharp contrast, the prices of cutting - edge models are still rising.
The subscription price of ChatGPT Pro has been raised, the API price of Claude has soared, and Gemini has the sharpest increase - the price has tripled in some scenarios. The pricing of GPT - 5.5 has even doubled directly.
You use more, pay more, but the effect may not be better.
Enterprises quickly figure out the math.
Fleeing to DeepSeek
An escape route has taken shape.
The training cost of DeepSeek V3 is only 1/10 to 1/20 of that of cutting - edge models, and the API price is as low as 1/16 of similar models.
What about the performance? It's comparable to GPT - 5.
A post on Hacker News has gone viral: an 11 - month ROI model that teaches enterprises step - by - step how much they can save each year by switching from GPT - 5.5 to DeepSeek.
The consensus in the comment section is simple: The pricing power of cutting - edge models is collapsing.
When an open - source model can achieve 90% of the effect at 1/16 of the price, price hikes are no longer a sign of confidence but an accelerator for customer loss.
Tokenmaxxing - enterprises frantically increasing Token usage to extract AI value - was originally the growth narrative of cutting - edge models.
But now, according to The Information, this strategy is backfiring on the profit margins of AI companies themselves.
The more users, the more losses. If you raise prices to stop the bleeding, users will leave.
A classic death spiral.
The final - round situation before the computing - power cliff
Let's take a broader view.
Cutting - edge laboratories - OpenAI, Anthropic, Google DeepMind - only account for 20% - 30% of the global AI computing power.
The remaining 70% - 80% is in the hands of enterprises for self - use, cloud service providers, and inference service providers.
This means that even the top - notch laboratories cannot solve the supply - demand gap by building their own computing power. Like everyone else, they are competing for the same batch of chips.
The computing power increases by 3.4 times a year, and the demand increases by 10 times a year. This gap will not disappear automatically.
Smaller models are indeed replacing some of the demand - the rise of the distillation layer proves this. But the improvement in capabilities is constantly creating new demand.
The AI industry is standing on the edge of a cliff.
It's not a technological cliff, as the models are still getting stronger. It's an economic cliff, where the math doesn't add up.
When GPU rental prices double, API prices skyrocket, the performance of open - source alternatives approaches that of cutting - edge models, and the ROI of coding tools is questioned, a core question emerges:
Is the moat of cutting - edge models intelligence or computing power?
If the answer is computing power, then whoever controls the chips controls the future of AI. If the answer is intelligence, then DeepSeek, which can achieve similar effects at 1/16 of the price, is already shaking this answer.
Reference materials:
https://counterpointresearch.com/en/insights/Memory-Prices-Surge-Up-to-90-From-Q4-2025
https://www.signalbloom.ai/posts/outsourcing-plus-localai-will-soon-become-more-economical-vs-frontier-labs/https://news.ycombinator.com/item?id=48278610
This article is from the WeChat official account “New Intelligence Yuan”, author: ASI Revelation; editor: David. It is published by 36Kr with authorization.