From chips to the cloud, why is there a full-scale shortage of computing power?
A noisy yet profound industrial transformation is taking place across the entire computing power supply chain.
In 2026, a shortage of computing power that spanned the entire industrial chain, covering chips, cloud services, servers, and data center components, swept across the globe. The scarcity of computing power and price hikes across the board permeated the entire AI industry.
Companies in the global capital market related to computing power reached their peak. The NASDAQ technology index continued to climb, and NVIDIA's market value kept rising. The cloud business revenues and profits of Amazon, Microsoft, and Google reached record highs. The valuations of two major AI startups, OpenAI and Anthropic, were approaching one trillion US dollars.
Similar changes also occurred in the Chinese market. The NASDAQ Golden Dragon China Index continued to rise, and the value of the computing power sector in the A-share market was re - evaluated. The stock prices of domestic AI chip companies such as Hygon Information, Cambricon, and Moore Threads remained at high levels for nearly a year. The market values of server companies like Foxconn Industrial Internet and key enterprises in computing - power peripheral components such as Zhongji Innolight continuously hit new highs, and the market values of enterprises in niche computing - power segments kept rising.
The capital market and the industrial market are resonating. More and more people, whether from investment institutions or within the computing - power industrial chain, believe that this round of computing - power shortage is not a traditional cyclical imbalance between supply and demand. It is more like a signal before a new round of industrial transformation.
Over the past 20 years, the consensus in the entire technology industry has been that computing power would only become cheaper.
The "Moore's Law" in semiconductors and the "scale effect" in cloud computing jointly contributed to this trend - the density of transistors in chips continued to increase, and the unit computing cost kept decreasing. Cloud computing allowed more users to flexibly schedule computing power, improving utilization and spreading costs.
In 2026, this logic seemed to fail temporarily.
Because the global computing - power industrial chain has entered a state of full - scale shortage - from the GPU (Graphics Processing Unit), CPU (Central Processing Unit), and HBM (High - Bandwidth Memory) of servers to the optical modules, copper modules, high - speed switches, power, and liquid - cooling resources in data centers, and even cloud computing and Token resources, almost all are in short supply.
The butterfly effect began to emerge: the prices of chips and servers were rising, the price of cloud services was rising, and the prices of mobile phones and PCs were also rising due to the pressure of chip and storage costs. Even free AI products like Doubao App under ByteDance began to plan to charge fees.
The reversal of the supply - demand pattern is the core root cause of this round of shortage and price hikes.
On the demand side, the AI application of Agent is booming. AI has moved from chatting to work and has entered the real production environment on a large scale. Every question - answer session, task execution, code generation, and Agent call uses computing power for reasoning and consumes Tokens. Based on this trend, companies in the global technology and computing - power fields have launched the largest - scale round of computing - power investment in the past decade.
The international market research institution IDC predicted in 2026 that the number of active Agents globally would increase from 28.6 million in 2025 to 2.216 billion in 2030. Five years later, the number of active Agents will be nearly 80 times the current level.
The growth of computing - power consumption in the Chinese market is obvious. Data from the National Data Bureau shows that as of March this year, the daily average Token calls in China exceeded 140 trillion, a 1400 - fold increase compared to 100 billion at the beginning of 2024.
The demand is booming, but the supply cannot keep up. On the supply side, the semiconductor and data - center - related industries are capital - intensive and have long cycles. Whether it is the HBM memory of SK Hynix, Samsung Semiconductor, and Micron Technology, the GPU of NVIDIA, the CPU of Intel and AMD, or the expansion of the supply chain around data centers, all require more time.
Due to the explosive growth of demand and relatively insufficient supply, relevant people in the fields of cloud computing, ICT hardware, and semiconductors told Caijing that the trend of shortage and price hikes will last for at least 1 - 2 years.
However, this round of computing - power shortage is not a simple supply - chain crisis like in the past. It is more like a signal before the AI industry's flywheel starts. It's just that the flywheel is spinning too fast, and the gears of the supply chain are not fully engaged, so shortages and price hikes have occurred.
The prelude to the global AI scale - up era is being opened.
Generated by GPT - 5.5
01
Unprecedented Demand for Computing Power
The growth of computing - power demand in this round is unprecedented, even exceeding that of the past 20 years.
The international market research institution Gartner has long - term statistics and forecasts on global IT spending (including data centers, equipment, software, IT services, communication services, etc.).
Gartner data shows that the global data - center investment scale reached 505.6 billion US dollars (about 3.4 trillion yuan) in 2025, a year - on - year increase of 51.6%. It is expected to reach 788 billion US dollars (about 5.4 trillion yuan) in 2026, a year - on - year increase of 55.8%.
Caijing reviewed Gartner's global IT spending data over the past 20 years and found that the data - center investment scale and growth rate from 2025 to 2026 are at least the highest in the past 20 years (from 2006 to the present).
Specifically, the capital expenditures of technology and computing - power giants in China and the United States are all in a stage of rapid expansion.
The capital expenditures of seven technology/computing - power giants in China (Alibaba, ByteDance, Tencent, Baidu, China Mobile, China Unicom, and China Telecom) were about 658.6 billion yuan in 2025, a year - on - year increase of 16%. Conservatively estimated, the capital expenditures in 2026 will be more than 683.6 billion yuan, an increase of at least 4%.
The capital expenditures of five technology/computing - power giants in the United States (including Amazon, Microsoft, Google, Meta, and Oracle) were 450 billion US dollars (about 3.1 trillion yuan) in 2025, a year - on - year increase of 70%; it is expected to reach 760 billion US dollars (about 5.2 trillion yuan) in 2026, a 69% increase.
The combined computing - power investment of technology/computing - power giants in China and the United States of nearly 80 billion US dollars has even exceeded the fixed - asset investment scale of many sovereign countries (including Germany, the United Kingdom, South Korea, Russia, Brazil, etc.) in 2025.
The explosion of Agents is driving the explosion of computing - power demand. This has led the revenue growth rates of major global cloud - computing providers (including Amazon AWS, Microsoft Azure, Google GCP, Alibaba Cloud, and Oracle OCI) to reach their highest levels in the past three years.
Major cloud providers are even regarding Tokens as the next core growth point. The Token revenues and their proportions of each provider are also growing rapidly. It is even changing the product architecture and sales strategies of cloud computing.
Over the past decade, the units for measuring computing - power demand have always been "card - hours" (the rental duration of chips on the cloud), the number of servers, and the number of chips. Providers were more concerned about how many hours of CPU/GPU cloud resources, how many CPU/GPU chips, and how many servers were sold.
With the explosion of Agents, computing power has been disassembled into Tokens, a more granular, real - time measurable, and continuously consumable resource unit. The past model of one - time purchase of servers or cloud resources is starting to shift to a continuous Token - consumption model.
The threshold for users to use computing power has also been significantly lowered - AI is moving from chatting and dialogue to daily work. Its Token consumption today far exceeds that of past AI dialogue tools.
Xin Zhou, the general manager of Baidu Smart Cloud's large - model platform, told Caijing in December 2025 that an Agent executes a series of tasks. During the task, the model will continuously use code to plan tasks, call tools, and record the execution status. Each step may trigger a new model call. A single dialogue may only consume thousands of Tokens, but a single task may consume tens of thousands or even hundreds of thousands of Tokens.
Caijing tried several different tasks on the Volcengine Ark under ByteDance and OpenAI's Codex platforms - the Token consumption for daily dialogue is within 1000. The Token consumption for AI to read and analyze an article reaches more than 5000. The Token consumption for analyzing the PDF files of a company's 24 - quarter financial reports reaches more than 100,000. The Token consumption for building a small web application for a company's financial analysis reaches the level of hundreds of millions.
The growth of Token consumption has led to a shortage of computing power in both China and the United States - salespeople from Alibaba Cloud and Amazon AWS both told Caijing that the computing - power market in 2026 is a seller's market. As long as there is computing power, it can be sold.
On May 13, in the earnings conference call for Alibaba's fourth fiscal quarter of 2026 (the first quarter of 2026), Wu Yongming, the CEO of Alibaba Group, indirectly confirmed this statement. He said that currently, there is not a single card idle in Alibaba's servers.
The growth of Token consumption has also driven the growth of Token revenues of various technology or computing - power companies. Although its proportion in the cloud business of each company is only in single - digits, its growth rate is extremely fast.
Caijing exclusively learned that as of May 13, the daily average Token revenue of Alibaba Cloud has increased by more than five times compared to the beginning of April. The monthly revenue has now reached the level of hundreds of millions of yuan. (For details, see "Exclusive | Alibaba Cloud's Daily Average Token Revenue Has Increased Five - Fold Compared to the Beginning of April")
Alibaba's management disclosed in the earnings conference call after the fourth fiscal quarter of 2026 that the annual recurring revenue (ARR, calculated as the current month's revenue × 12) of models and applications, including the Bailian MaaS (Model as a Service) platform, is growing rapidly. Its revenue in the fourth fiscal quarter of 2026 exceeded 8 billion yuan, is likely to exceed 10 billion yuan in the first fiscal quarter of 2027, and will exceed 30 billion yuan by the end of the 2027 fiscal year. Relevant people from Alibaba Cloud told Caijing that these revenues are mainly from Tokens.
Volcengine, the cloud and AI business under ByteDance, disclosed in April this year that as of March, the daily average Token usage of the Doubao large model exceeded 120 trillion. In December 2025, this figure was 63 trillion. That is to say, it has nearly doubled in three months.
Caijing learned from multiple sources at the beginning of 2026 that Volcengine's Token revenue from external business (excluding ByteDance's internal business) in 2025 far exceeded 1 billion yuan. At the end of 2025, Volcengine set a growth target of at least doubling its Token revenue in 2026. In 2026, as the Doubao video model Seedance 2.0 became popular, Volcengine's Token revenue target continued to be significantly raised.
A senior strategic planning person from a Chinese ICT (Information and Communication Technology) hardware company told Caijing in the second half of 2025 that he conducted a sensitivity test (a method for dynamic market analysis based on changes in external market conditions) on the possible future Token revenue of Volcengine. The results showed that Volcengine's Token revenue may increase to 10 billion yuan in the next one to two years.
The trend in the US market is similar, and the Token revenues of Amazon and Google's cloud businesses are also increasing significantly.
The Token consumption of Amazon AWS in the first quarter of this year exceeded the sum of historical quarters. Amazon disclosed in its first - quarter 2026 earnings conference call that the customer spending on its MaaS platform Bedrock increased by 170% quarter - on - quarter. Caijing learned that currently, Bedrock's annual revenue is in the billions of US dollars, accounting for a single - digit percentage of Amazon AWS's 128.7 billion US dollars in revenue in 2025.
As early as the third - quarter 2025 earnings conference call, Amazon AWS management said that in the long run, Bedrock's revenue contribution will be comparable to that of EC2. Caijing learned that EC2 is Amazon AWS's most core computing product, with an annual revenue of at least more than 40 billion US dollars, accounting for more than 30% of the total revenue. According to this prediction, Bedrock's future revenue will reach tens of billions of US dollars.
In April this year, Mai - Lan Tomsen Bukovec, the vice - president of technology at Amazon AWS, told Caijing in a small - scale communication that inference is becoming a normalized application. She also said that this is the original intention of Bedrock. Users don't have to be AI experts, and any developer can use inference applications through APIs (Application Programming Interfaces).
Google has continuously disclosed the Token growth situation in its earnings conference calls in the past six months. Google's management said in the first - quarter 2026 earnings conference call that Google's first - party model currently processes more than 160 billion Tokens per minute, higher than the 100 billion in the previous quarter. In the past 12 months, 330 Google Cloud customers have each processed more than 1 trillion Tokens. Among them, 35 have reached 10 trillion Tokens.
Tokens have transformed computing power from a one - time infrastructure investment into a continuously consumable and real - time billed computing - power resource. More