HomeArticle

Will the shortage of computing power turn China's AI into a bubble?

胡说成理2026-06-23 10:42
China's AI sector lacks affordable computing power, cannot build sufficient production capacity, and faces the risk of a bubble burst without capacity backstops

Yann LeCun, one of the so - called "Godfathers of AI", recently made a rather terrifying prediction. He believes that the burst of the AI bubble is not as far off as we think, and it could burst at any time. This inference also applies to the Chinese AI industry.

His view is quite incisive - the bubble doesn't mean that AI has no value; on the contrary, it's too valuable. However, if the demand for AI cannot be supported by a low - enough computing power cost, the revenue of many companies will fall short of expectations, leading to a collapse in market value and, in turn, triggering a bubble.

Therefore, what this article really wants to explore is not which is smarter, Zhipu or Kimi, nor whether DeepSeek's approach is more ingenious. Instead, it's a more realistic question: When everyone is queuing up for Tokens, can China build a set of free, abundant and continuously price - decreasing production capacity for its AI supply chain?

If we have enough and cheap Token production capacity when the computing power explosion arrives, today's purchase restrictions are just the congestion before dawn. If such a chain cannot be established, what the computing power shortage will force out is no longer just the characteristics of a certain product, but a bubble and the sound of its burst.

01

The AI Bubble Is Not Far Away

Yann LeCun, one of the so - called "Godfathers of AI", recently made a rather terrifying prediction. He believes that the burst of the AI bubble is not as far off as we think, and it could burst at any time. This inference also applies to the Chinese AI industry.

In a CNBC interview on June 18th, he made a very simple economic calculation: Although the prices of high - end AI products have been rising, the cost of the Tokens needed to run them has been dropping too slowly. So slowly that almost all companies are using investors' money to subsidize users.

His inference is straightforward. If the cost ledger of Tokens cannot be improved, the closed - loop of super - valuation, super - market value, and rapid revenue growth won't work. If the entire industry gets stuck in this vicious cycle, proving that the most cutting - edge AI can't generate enough revenue, the bubble "won't last long".

This is not alarmist talk.

After the merger of Elon Musk's xAI and SpaceX, its valuation soared to $2 trillion, but it lost $2.5 billion in a single quarter while only earning a little over $800 million. Anthropic, the company behind Claude, spends $1.25 billion per month renting Musk's GPUs to run its models. Even OpenAI, which has always been stubborn, with Altman admitting that cost is now "a huge problem".

This is an old trick from the Internet era - using investors' money to subsidize users, expanding the scale first, and then looking for commercialization opportunities. If they can't be found, the bubble will burst. The bet in AI is even bigger: Everyone is betting that the cost of inference will keep dropping until one day it can outpace the rate of consumption. If the bet pays off, a new era of AI will dawn; if not, it will end up as messy as the aftermath of the Internet bubble.

The key lies in having sufficient and inexpensive Token production capacity. However, currently, the Chinese AI industry seems to have hit a wall in computing power production capacity.

This June, several of China's most powerful programming large - language models were launched in quick succession: On the 13th, Zhipu open - sourced GLM - 5.2, whose code - writing ability was once ranked second globally, only behind Claude. Kimi introduced K2.7 Code, which specializes in programming, and MiniMax released M3, which focuses on intelligent agents. But almost simultaneously, these companies did the opposite - they tried to stop you from buying.

Currently, you have to scramble for Zhipu's packages every day, and the price has increased three times in a year. The interfaces of Kimi and MiniMax are constantly overloaded, and developers are queuing up "waiting for Tokens". The situation of tech giants is slightly better, but they also face a shortage of high - end computing power and have issued multiple warnings to users.

What should support future market value with unlimited supply has become "rationed supply" in the era of economic scarcity, which in itself is quite ironic.

But the alarm signal is real and clear - if a company selling digital products starts to restrict purchases, it means admitting that what it sells is no longer software that can be infinitely replicated, but an industrial product with a production capacity limit. This is the first alarm bell from the bottom of China's AI production capacity pool.

Even more seriously, if the operating cost can't be reduced, what follows this alarm is no longer just purchase restrictions, but the bubble that Yann LeCun mentioned and the sound of its burst.

02

Why Can't Shareholders and Investors Be Relyed On?

The current computing power shortage is not caused by the high daily active users of the C - end like Doubao. What really skyrocketed the demand is the collective bet on AI programming and intelligent agent scheduling frameworks by large - language models in China and even globally this year.

No matter how chatty a conversation is, it only consumes tens of thousands of Tokens at most. However, a programming intelligent agent has to gobble up an entire code library, run commands repeatedly, modify files, and conduct self - checks. When MiniMax demonstrated M3, it took nearly 12 hours for the model to independently reproduce a paper. The Token consumption of such tasks is dozens or even hundreds of times that of a chat.

So, on the same day that MiniMax released M3, it simply changed its monthly - subscription billing model, which had been in use for many years, to a Token - based billing model. For heavy users, the actual cost has doubled or tripled.

US industry players have calculated an even more extreme ratio. A developer calculated that the Tokens consumed by using a $200 package of Claude and ChatGPT actually cost a precise ten - fold amount, $2,048.

The good old days of "running freely with a monthly subscription of just a few dozen yuan" seem to be gone forever. Moreover, this won't be limited to just this one company. The pricing logic of software that can be infinitely replicated is no longer considered suitable for AI. It is being replaced by a pricing logic based on production capacity, which will become the industry norm in the future.

Theoretically, these companies should not be short of computing power. Behind Zhipu stand Tencent, Alibaba, Ant Group, Meituan, and Xiaomi. The largest shareholder of Kimi is Alibaba, holding a 40% stake, and Tencent has also participated in the investment. Logically, with such powerful backers with vast cloud - computing resources, they should not be the ones facing purchase restrictions.

The problem is that the backers themselves are also short on resources.

The computing power shortage in 2026 is not due to a shortage of a single component but a bottom - out situation across the entire chain of chips, storage, packaging, networks, and data centers. The industry predicts that this tight situation will last for at least two more years.

A person from an ICT manufacturer put it bluntly: In the past, two million yuan could buy eight GPU servers, but now it can only buy four or five. Manufacturers would rather break contracts than deliver.

So, in March, Tencent Cloud took the lead in raising the prices of some of its products under its Hunyuan model, with some prices skyrocketing four - fold. Alibaba Cloud and Baidu Cloud followed suit within a few hours. Even large - language model companies with a solid AI cloud - business foundation are running short of computing power. This is the current fundamental situation of China's AI production capacity.

Of course, prices are not rising uniformly. DeepSeek permanently reduced the price of its V4 - Pro interface to one - fourth of the original price. Xiaomi's MiMo cut the price by 90%. Tencent Cloud simply slashed the price of DeepSeek hosted on its platform by 97% - the call price for cache hits was reduced to $0.025 per million Tokens, cheaper than making a phone call.

This might seem contradictory to the "lack of production capacity", but in fact, it is not inconsistent with the purchase - restriction dilemma. Instead, they are two sides of the same shortage: The price cuts are from efficiency - driven companies like DeepSeek and Xiaomi, which have managed to truly reduce the cost of the low - end segment by squeezing the cache and sparse architectures to the limit. The companies with price hikes and purchase restrictions are those in the high - end programming segment, like Zhipu, where the more capable the product, the greater the demand. Even Tencent is hedging its bets - raising the price of its Hunyuan while subsidizing DeepSeek. Prices are not rising or falling uniformly but are differentiating by segment.

What's more intriguing is that the tacit understanding between startup model companies and their capital backers has subtly changed.

When Internet giants invested in these model companies, especially those giants that are also involved in AI, they had two motives: Technologically, they were betting on an external team as an insurance policy for their in - house R & D. Commercially, they were using capital to tie down a long - term customer - you take my money, and then you come to buy my computing power. It seemed like a win - win situation.

However, once computing power becomes scarce, these so - called VIP customers who have received investment are starting to experience the other side of the binding relationship. Some cloud giants have publicly stated that they will prioritize allocating the scarce computing power to their own high - value businesses, leaving the invested startup companies waiting in line. In essence, your landlord, your creditor, and your competitor are often the same person.

As a result, the interfaces are still overloaded, the packages are still under purchase restrictions, and yet the money keeps burning. According to the prospectus, Zhipu spends 70% of its R & D investment on purchasing computing power and has lost 6.2 billion yuan in three and a half years. MiniMax also spends 70% of its R & D on computing power, losing about 9.2 billion yuan in three and a half years, and its annual procurement limit with Alibaba Cloud is still increasing year by year.

This shows that it is a consensus that being tied to a backer does not guarantee priority. All that remains is an unsolvable problem: With limited production capacity, only those who can afford and withstand the cost will survive.

Driven to this point, a new idea emerges: If the backers can't be relied on, can the companies rely on themselves?

03

The Awakened Ones

There used to be a willing and harmonious relationship between model companies and investors: You invest money, and I use it to buy your computing power. I get the computing power, and you get the profit. It seemed like a perfect match. However, once the pool of backers is not deep enough, this tacit understanding falls apart. Model companies have to wake up from this seemingly win - win dream and find their own way out.

There are only two ways out: squeezing efficiency to the extreme and embracing domestic production capacity.

I'll talk about the former in the next section. Now, let's focus on adapting to domestic chips - on this path, the general direction is actually the same. Zhipu entrusted the training cluster of GLM - 5 to Huawei Ascend, with Digital China exclusively delivering servers based on Ascend and Kunpeng. It also made GLM - Image at the beginning of the year the first top - notch multi - modal model trained entirely on domestic chips.

DeepSeek has gone even further. The V4 model, released in April, was launched on Huawei Ascend even if it meant a delay. The underlying code was completely rewritten from NVIDIA's CUDA to Huawei's CANN, aiming to send a signal that production capacity can be controlled.

Zhipu's adaptation to Ascend and DeepSeek's adaptation to Ascend are essentially the same thing. However, they diverge at the next step.

Zhipu still entrusts the source of computing power to cloud providers and shareholders. In contrast, DeepSeek is going upstream: On one hand, it is hinting that it won't chase after short - term profits. On the other hand, it is frantically recruiting talents for data center construction and management. This approach is clearly aimed at building its own gigawatt - level computing power base.

What really shows its determination is the first - round financing in June. Founder Liang Wenfeng invested about 20 billion yuan himself as the largest contributor, keeping investors out of the board of directors. I believe that his determination is to ensure that this radical plan won't be held back by shareholders. If this path is successful, it will be the first pure model company in China to build its own large - scale computing power infrastructure, forging a third path between startups and giants.

Unfortunately, waking up is one thing, and being able to succeed is another. All of this depends on the domestic hardware foundation - how strong is it really?

Just looking at market share is indeed encouraging: In 2025, the domestic shipment share of Chinese AI chips in the domestic market had climbed to 40%. Huawei had the highest shipment volume, accounting for nearly half, and Cambricon's revenue increased by more than 20 times in a year.

But taking a closer look, it's not as rosy as it seems. In terms of single - high - end chips, NVIDIA's flagship still outperforms Huawei Ascend by four to six times in many indicators. Huawei has to connect hundreds of chips through optical modules to form super - nodes to catch up or even surpass NVIDIA at the cluster level, at the cost of nearly four times the power consumption.

Domestic hardware can currently only offer a "working substitute", but it is still a long way from being a "superior substitute" with better performance that would make people willing to pay. Moreover, the entire market is still in short supply. The awakened ones have identified the direction, but the path of embracing domestic production capacity is itself stuck in a production - capacity bottleneck.

04

Product Characteristics Changed by the Computing Power Shortage

Building self - owned infrastructure and moving upstream continuously is a long - term solution. To quench the immediate thirst, companies can only improve efficiency through the engineering capabilities of software - hardware synergy. In a sense, once the computing power is not freely available, the first thing to change is a company's product characteristics.

Kimi's solution is to squeeze efficiency into its architecture.

Its inference runs on an architecture called Mooncake. Centered around the cache, it separates and schedules the "pre - filling" and "decoding" phases separately. Then, it pools and reuses the calculated KV caches across the entire cluster, enabling the same batch of GPUs to handle several times more requests. In the core part, it still has a deep partnership with Alibaba Cloud, using an elastic combination of computing power to improve task stability and utilization. It has also developed its own security gateway to avoid the heavy burden of building its own computer rooms.

For this reason, Yang Zhilin, the founder of Kimi, often mentions a term - Token efficiency, and he showcases the team's MUON optimizer, which can double learning efficiency. He is indeed very clear - the key to winning the competition has long shifted from resource stacking to the efficiency of the inference system itself.

Zhipu's solution is to turn efficiency into a salable product, using an almost brute - force engineering approach. Its high - speed inference engine, TileRT, statically arranges the entire computational graph into a kernel resident in the GPU during compilation, enabling the flagship model to output about 400 Tokens per second. The ZCube network architecture, developed in cooperation with Tsinghua University, can increase the inference throughput of the same batch of hardware by 15% without adding a single GPU or changing a single line of code. It also