The Blaze of the Large Model War Spreads to the Edge: A Computing Power Revolution Reconstructing the Industrial Landscape

AI is shifting from the cloud to the edge, where the computing power at the edge and the models are evolving in tandem.

In 2025, as Google launched the "Sun Capture Project" and OpenAI opened the "Stargate", the global AI industry seemed to be rushing towards a cloud-based competition centered on "computing power supremacy". Meanwhile, another silent revolution was quietly taking place on terminal devices.

In May, OpenAI acquired the AI hardware company io founded by former Apple Chief Design Officer Jony Ive for $6.5 billion and planned to launch its first screenless AI hardware product by the end of 2026. In November, Elon Musk predicted that traditional mobile phones would completely disappear within the next 5 - 6 years, to be replaced by devices serving only as "AI inference edge nodes". In December, ByteDance ventured into the Doubao AI phone, causing quite a stir.

The flames of the large - model war are spreading from the "front - line battlefield" of cloud computing power to the "capillaries" of billions of terminal devices. A race on the edge side concerning the future landscape of AI has officially begun.

01. The Epic Evolution of Computing Technology Drives the Transfer of Computing Power Hegemony

The development of computing technology has never been a linear process but is driven by paradigm shifts:

In 1945, John von Neumann's "stored - program" theory laid the technical foundation for general - purpose computing.

In 1946, the birth of the ENIAC electronic computer opened the chapter of the general - purpose computing era. Computing centered around the CPU solved the problem of "the existence of computing".

In 2006, NVIDIA released the CUDA architecture, pushing the GPU from graphics rendering into the field of general - purpose computing. Its parallel computing power was more than 100 times that of the CPU, marking the official arrival of the accelerated computing era.

In 2012, the first neural network model AlexNet trained on the CUDA platform reduced the ImageNet image recognition error rate from 26% to 15% and consumed 90% less computing power than the CPU - based solution, officially announcing the start of the deep - learning era.

The breakthrough of generative AI in 2020 pushed accelerated computing to new heights. The emergence of LLM (Large Language Model) and VLM (Visual - Language Model) created the demand for "cognitive - level computing".

Different from traditional tasks, large - model computing has extreme requirements for parallel processing power and massive data throughput, posing higher demands on both computing power and bandwidth. In particular, the VLM model needs to process visual and language data simultaneously to form a closed - loop of "visual perception - language understanding - decision generation", accelerating the development of cloud AI chips.

The rapid iteration of large models has broken the monopoly pattern of cloud AI chips and spawned a collaborative ecosystem of "cloud training - edge - side deployment". Cloud computing power continues to stack up, and cloud models continue to develop towards ultra - large parameter scales. Edge - side models focus on extreme compression to achieve optimal performance with limited computing power, forming a collaborative model of "cloud training, edge - side deployment". The cloud, as the "brain" of AI intelligence, is responsible for large - model training, global decision - making, and knowledge management. The edge side, as the "nerve endings" of AI intelligence, undertakes functions such as active perception, real - time decision - making, and user interaction.

Today, cloud computing power has completed the historical transformation from general - purpose computing dominated by the CPU to intelligent computing centered around the GPU. According to data from the international TOP500 organization, in 2019, nearly 90% of the computing power of the TOP500 supercomputers completely relied on the CPU. By 2025, this figure had dropped sharply to less than 15%. This means that the transfer of computing power hegemony between Intel and NVIDIA was completed in just six years.

02. The Edge Side Is Set to Become the New Battlefield for AI Development

The ultimate value of AI lies not in the parameters in the laboratory but in its ability to transform the real world and the adoption rate of the technology by society. The high latency and high cost of cloud - model deployment make it difficult to adapt to edge - side scenarios such as industry and consumer applications. Thus, the model distillation technology, known as "enabling small models to have great wisdom", emerged. This technology compresses trillion - level models to hundreds of billions or even billions of levels. While maintaining relatively high performance, it significantly reduces the model size and computational complexity, enabling deployment in edge - side scenarios such as AI PCs, local meeting - note all - in - one machines, AI mobile phones, and AI gateways.

According to the forecast of Frost & Sullivan, the global edge - side AI market size will reach 1.2 trillion yuan in 2029, with a compound annual growth rate of up to 39.6%. The penetration rate of Lenovo's AI PCs has exceeded 30% of Lenovo's total PC shipments. The annual sales volume of the intelligent meeting device Plaud has exceeded one million. All Meta AI glasses were sold out within two days of their release, and the sales volume is expected to exceed ten million in 2026. Edge - side AI is increasingly becoming a must - fight territory for technology giants.

China has natural advantages in the development of edge - side AI. On the one hand, there is high - level attention in the top - level design. On the other hand, there is a huge edge - side market and application scenarios. According to the forecast of Frost & Sullivan, the Chinese edge - side AI market will reach 307.7 billion yuan in 2029, with a compound annual growth rate of up to 39.9%.

Policy documents such as the "15th Five - Year Plan Suggestions" and the "Implementation Guidance for the 'Artificial Intelligence +' Initiative" clearly state that the "Artificial Intelligence +" initiative should be fully implemented to seize the high - ground of AI industrial applications and empower all industries comprehensively. Aiming at the end of the "15th Five - Year Plan" in 2030, through five - year efforts, the goal of achieving a smart - terminal penetration rate of over 90% and an industrial scale of over 10 trillion yuan will be realized, injecting strong and definite impetus into the large - scale explosion of edge - side AI. The policy guidance with "quantitative indicators" not only provides a clear direction for enterprise R & D and market promotion but also promotes the transformation of edge - side AI from "scattered pilot projects" to "large - scale implementation".

China is the world's largest producer of consumer electronics, household appliances, and automobiles, providing a huge market demand for edge - side AI chips and solutions. The world's most complete edge - side AI industrial chain has been formed in China: chip manufacturers such as Huawei Ascend, Horizon Robotics, Rockchip, and Houmo Intelligence provide hardware support at the upstream; enterprises such as Deepseek, Alibaba, and iFlytek provide algorithm models in the middle; terminal manufacturers such as Honor, Lenovo, and Xiaomi are responsible for product implementation at the downstream, forming an ecological advantage that is difficult to replicate.

The resonance between policy support and market demand will drive the rapid growth of the domestic edge - side AI market and promote the migration of large models from the cloud to billions of terminal devices such as mobile phones, PCs, automobiles, and robots.

03. The Sinking of Large Models and the Mutual Attraction with Edge - Side Computing Power

"Killer applications" have emerged for cloud - based large models: the latest disclosed weekly active users of ChatGPT have reached 800 million, and the monthly active users of Doubao and Deepseek have also reached 172 million and 145 million respectively. However, the killer app for edge - side large models is still on the way.

The core of this difference lies in two aspects. On the one hand, under the constraints of edge - side power consumption and cost, the performance of edge - side computing power chips determines the physical basis for running edge - side models. On the other hand, the iteration of model quantization and compression capabilities determines the software upper limit of the models.

Will the Hegemony Alternation Law from Cloud CPU to GPU Continue on the Edge Side?

Let's take a look at two underlying logics:

1. AI - Dominated Computing Has Occurred in the Cloud

As mentioned above, the ratio of CPU:GPU in the cloud computing architecture has changed from 9:1 in 2019 to 1:9 in 2025. The emergence of large models in 2020 clearly accelerated this revolution in the computing paradigm. Looking towards the future at the end of 2025, will the edge - side computing paradigm continue the new pattern in the cloud, where AI computing power chips are dominant and the CPU is auxiliary?

Similar to what has happened in the cloud, if traditional data processing, retrieval, query, and recommendation on the edge side are all presented to users in the form of AI computing, users will vote with their feet. If the Deepseek model is installed on a PC or mobile phone, Deepseek can directly access important work files on the computer, videos in the mobile - phone photo album, chat records, etc. (since there are no privacy or latency issues locally), which can not only help us complete work more efficiently but also make chats more in - depth and interesting.

In the cloud, large models have completely changed the rules of the game, and the accelerating AI flywheel is irresistible. Therefore, replicating the cloud story on the edge side is just an engineering problem.

2. There Is a Physical Limit to Data Transmission in Data Centers, and Edge - Side AI Is Not an Option

Elon Musk said in the Joe Rogan Podcast that local inference processing at edge nodes and collaboration with servers is not an option but the only feasible forward path in terms of architecture.

The industry generally designs according to the bandwidth requirement of about 25 - 50 Mbps per channel for operational - level 4K streaming media. Based on the relatively conservative 25 Mbps per user and the total bandwidth of 997 Tbps (i.e., 997,000,000 Mbps) of submarine cables in 2023, the theoretical maximum number of simultaneous 4K streaming users is about 39,880,000 (about 40 million). This user volume is far from being able to support the current user base of ChatGPT, not to mention the user volume of super - apps like WeChat. Therefore, complete rendering and calculation on the server side would require an unrealistic data transmission rate, possibly exceeding the current global bandwidth capacity. So, it is unrealistic for the Internet and the cloud to handle this part of the perception and calculation on the edge side.

In summary, the hegemony alternation between cloud CPU and GPU is essentially the screening of the computing paradigm by "efficiency and scenario adaptation". These two underlying logics also hold true on the edge side - the user demand for AI - enabled experiences is irreversible, and the physical limit of data transmission cannot be broken. Therefore, the transformation of the edge - side computing architecture is not accidental but an inevitable result of the combined action of technological evolution and real - world needs.

Edge - Side Replication: The Symbiotic Evolution of Models and Computing Power

1. Accelerated Implementation of Model "Slimming"

The MoE architecture enables large models to "slim down" while maintaining performance, clearing the architectural obstacles for edge - side deployment. After distillation, the model significantly reduces its size and computational complexity while maintaining relatively high performance. Huawei's CBQ quantization algorithm compresses the model to 1/7 of its original volume while retaining 99% of its performance. Alibaba's reverse distillation technology makes the performance of the 2B model 8.8% better than that of the 7B model. Breakthroughs in model compression technology lower the computing - power threshold, enabling deployment in edge - side scenarios such as AIPC, AI mobile phones, and innovative AI hardware.

2. Surge in Edge - Side Computing Power Demand

With the widespread application of multi - modal large models such as VLM, which require simultaneous processing of visual details and text logic, the complexity of multi - modal data processing far exceeds that of pure text data, leading to a significant increase in computing - power demand. For example, the Qwen3 VL 8B requires at least dozens of TOPS of computing power to run. In addition, with the rapid development of agents, multiple models need to be repeatedly invoked, and the inference computing power will increase exponentially.

3. Significant Leap in Edge - Side Computing Power Supply

The growth in demand has stimulated the supply of computing power. In the pre - large - model era, edge - side chips usually had only a few TOPS of computing power. After the emergence of AI PCs, Intel and Qualcomm chips had dozens of TOPS of computing power. Among domestic chips, the Rockchip RK182X has an independent NPU with a computing power of 20 TOPS, and the Houmo Intelligence M50 can reach a computing power of 160 TOPS. For the first time, the computing power of edge - side NPUs has exceeded 100 TOPS and is expected to continue to evolve towards high computing power, high bandwidth, and low power consumption.

04. Currently, the Main Landscape of Edge - Side Computing Power Is the "Collaborative Development of SOC + NPU"

1. Edge - Side SOC Chips: Provide General - Purpose Basic Solutions

These chips are centered around ARM CPUs and integrate lightweight NPUs, emphasizing "cost - effectiveness + universality". The industry generally adopts a hybrid integrated architecture of "CPU + GPU + NPU + ISP" in architecture design, which is suitable for small edge - side models with 100 million to 1 billion parameters and is mainly used in edge - side scenarios such as smart speakers, customized tablet devices, and smart door locks. These devices have relatively low requirements for AI performance and focus on cost control. Take the Rockchip RK3588 as an example, it has 4 large ARM A76 cores, 4 small A55 cores, and an NPU with 6 TOPS of computing power, which is a combination of a powerful CPU core and a small - computing - power NPU. This is a typical product from the pre - large - model era, mainly for control purposes with a small amount of AI capabilities for handling scenarios such as general security with a focus on images. The Allwinner H88K, Juxin JX100, and Hengshuo HS610 have relatively less AI capabilities compared to Rockchip. Similar to the cloud computing architecture, the current edge - side computing paradigm is still dominated by the CPU, except that ARM replaces X86 on the edge side.

2. Edge - Side NPUs: Provide Solutions with Extreme Performance

Although the GPU has become the core of cloud - based AI computing, its high power consumption is a significant drawback on the edge side. Therefore, NPUs for edge - side AI computing are gradually becoming the mainstream.

GPUs exist in two forms: iGPU (integrated graphics) and dGPU (discrete graphics). NPUs also come in two forms: iNPU (integrated accelerator card) and dNPU (independent accelerator card). If you pursue more extreme AI performance, not only sufficient chip computing power but also enough bandwidth is required. The dNPU is the best choice as it does not need to compete for bandwidth with units such as the CPU core, GPU core, and ISP in the SOC. At the same time, the dNPU has the advantage of flexible configuration and can be used in combination with SOCs of different performance according to specific scenarios.

Currently, NPUs with relatively strong performance on the market, such as the Huawei Atlas 200I A2, Bitmain BM1684X, Rockchip RK182X, and Houmo Intelligence M50, can be adapted to models with 30 billion to 100 billion parameters.

Huawei Atlas 200I A2: It has a traditional architecture with a maximum Int8 computing power of 20 TOPS, a power consumption of 25 watts, and a maximum bandwidth of 51.2 GB/s. It can be deployed on devices such as drones and robots, with advantages in high integration and a complete software - hardware ecosystem.

Bitmain BM1684X: It has a traditional architecture with a single - chip computing power of 32 TOPS and a power consumption of 15 - 33 watts. It excels in the breadth and maturity of industry applications and is suitable for scenarios such as smart security and edge - computing servers that need to process a large amount of video streams. It has been implemented in more than 270 urban projects.

Rockchip RK1820/1828: It uses 3D stacked packaging with a maximum Int8 computing power of 20 TOPS. The power - consumption data has not been officially disclosed. Positioned as a coprocessor, when paired with a main processor (such as the RK