HomeArticle

Why has edge AI become a new battleground for tech giants?

半导体产业纵横2025-09-02 18:39
Behind the market worth hundreds of billions, why has edge AI become the new anchor point for the industry?

In the process of AI development, early cloud-based AI became the dominant force in the industry's development thanks to its powerful computing power and centralized data processing capabilities. However, as application scenarios continue to expand, especially in fields such as the Internet of Things, autonomous driving, and industrial control, the limitations of cloud-based AI have gradually emerged. Research by the International Data Corporation (IDC) shows that global spending on edge computing solutions will approach $261 billion in 2025, with an expected compound annual growth rate (CAGR) of 13.8%. It is projected to reach $380 billion by 2028. The retail and service sectors will account for the largest share of investment in edge solutions, accounting for nearly 28% of global total spending. This data intuitively reflects that the industry's focus is shifting from the cloud to the edge.

There is a growing concern that artificial intelligence is slipping into the bubble territory. A report titled "The GenAI Gap: The State of Business AI in 2025" released by the NANDA project at the Massachusetts Institute of Technology found that 95% of companies have hardly achieved any productivity improvement after developing generative AI tools. Even Sam Altman, the CEO of OpenAI, admitted that investors might be overly excited about artificial intelligence and compared the current market to a bubble.

However, industry insiders believe that this criticism is mainly targeted at the cloud-based artificial intelligence market and software algorithms.

01

Why is edge AI generation needed?

Currently, the mainstream large language models on the market, from OpenAI's GPT, Google's Gemini, Anthropic's Claude, to the popular domestic DeepSeek, almost all rely on AI cloud computing to complete generation tasks. This model that relies on remote servers, with its powerful computing power, can easily handle complex requirements such as large-scale model training and high-resolution image synthesis. Moreover, it has extremely strong scalability - from daily Q&A for individual users to large-scale enterprise-level deployment, it can be flexibly adapted. For ordinary users, such an experience is sufficient to meet their needs.

However, in enterprise-level applications or more complex scenarios, the shortcomings of the cloud-based model gradually emerge: Firstly, there is a relatively high latency, and the response speed of complex tasks is easily affected by network fluctuations. Secondly, it has a strong dependence on the network, and it cannot be used once the network is disconnected. Most importantly, there is a risk of data privacy - a large amount of raw data needs to be uploaded to the cloud for processing, which not only increases bandwidth costs but also may lead to data leakage due to vulnerabilities in the transmission or storage process. This is particularly troublesome for sensitive fields such as healthcare and finance.

Therefore, the advantages of edge generative AI begin to stand out. It directly deploys the generation capabilities on local devices - it could be our mobile phones, surveillance cameras, or autonomous vehicles and industrial machine tools. Data processing is completed locally throughout the process, and sensitive information does not need to leave the device, ensuring privacy and security from the source. At the same time, the low-latency feature of edge AI is a "savior for real-time scenarios": Autonomous driving requires millisecond-level road condition judgment, and industrial automation relies on immediate equipment fault warnings. Edge AI can be precisely adapted to these scenarios with extremely high requirements for response speed. More importantly, it does not need to transmit data frequently, greatly reducing the bandwidth requirement. Even in remote areas without a network or industrial workshops with weak signals, it can operate independently, and its stability and reliability far exceed those of the cloud-based model.

The technological prototype of edge intelligence can be traced back to the 1990s, when it emerged in the form of a content delivery network (CDN). Its initial positioning was to provide network services and video content distribution to users nearby through servers distributed at the network edge. The core goal was to divert the load pressure on the central server and improve the efficiency of content transmission and access.

However, with the explosive growth of Internet of Things (IoT) devices and the popularization of 4G and 5G mobile communication technologies, the global data generation volume has climbed exponentially, gradually entering the zettabyte (ZB) era. Against this background, the traditional cloud computing architecture has gradually shown its shortcomings: All data needs to be transmitted to the cloud for processing, which not only causes high bandwidth consumption but also leads to high latency problems due to the transmission distance. At the same time, the cross-network transfer of data also brings the risk of privacy leakage, and it has become difficult to meet the requirements of scenarios with high real-time and security requirements.

After entering the 21st century, to solve the pain points of cloud computing, the concept of edge computing was officially proposed. Its core idea is to move the data processing link from the cloud to the edge nodes close to the data source. By completing the preliminary screening, processing, and forwarding of data locally, the amount of data uploaded to the cloud is greatly reduced, thereby relieving the bandwidth pressure and reducing latency. However, edge computing at this stage mainly focused on the optimization of the data processing process and had not yet been combined with artificial intelligence (AI) technology, and did not involve the deployment and application of AI algorithms.

It was not until after 2020 that with the maturity of AI technology (especially lightweight models and low-power computing technology), edge computing and AI began to be deeply integrated, and "edge intelligence" emerged as an independent integrated technology. Its core feature is to deploy AI algorithms (including the inference and training links) on edge devices (such as IoT terminals and edge servers) close to the data generation end. It can not only achieve real-time data processing and low-latency decision-making but also avoid uploading raw data to the cloud, ensuring data privacy and security from the source.

Looking at the development process of edge intelligence, it can be clearly divided into three core stages: The first stage is centered around "edge inference", and the model training process still relies on the cloud to complete. The trained model is then pushed to the edge device to perform inference tasks. The second stage enters the "edge training" stage. With the help of automated development tools, the entire process of model training, iteration, and deployment is marginalized, reducing the dependence on cloud resources. The third stage, which is also the future development direction, is "autonomous machine learning". The goal is to enable edge devices to have the learning ability of autonomous perception and adaptive adjustment, and to complete model optimization and capability upgrading without manual intervention.

Of course, this does not mean that cloud-based AI will be replaced. Facing complex tasks such as ultra-large-scale model training and cross-device collaboration, the powerful computing power of the cloud is still irreplaceable. The future trend is more likely to be a complementary relationship between "cloud + edge": The cloud is responsible for the training and optimization of the underlying model, and the edge is responsible for the real-time deployment and data processing of local scenarios. The two work together to give full play to the computing power advantage of the cloud and take into account the privacy and real-time nature of the edge, ultimately promoting artificial intelligence technology to enter all walks of life more safely and efficiently.

Data source: precedenceresearch, compiled by Semiconductor Industry Review

Data from the market research institution Market shows that the global edge artificial intelligence market size is expected to exceed $140 billion by 2032, a significant increase from $19.1 billion in 2023. Data from Precedence Research shows that the edge computing market may reach $3.61 trillion in 2032 (CAGR 30.4%). These data indicate the broad development prospects of edge AI and explain why large companies are turning their attention to this new blue ocean.

02

Industry giants are making strategic moves to gain an early advantage

In the field of edge AI chips, large companies are engaged in fierce competition. As the core hardware support for the development of edge AI, the chip field has shown a trend of parallel innovation in computing power and architecture in the past two years.

Apple is actively deploying self-developed edge AI chips in its iPhone series. Taking the newly released iPhone 16 series as an example, the A18 chip it is equipped with is deeply optimized for AI functions. The A18 uses the second-generation 3-nanometer process and integrates a 16-core neural network engine, capable of performing 35 trillion operations per second. This powerful computing power enables instant face ID recognition and smooth Animoji generation, bringing the response speed into the millisecond era. At the same time, thanks to the local processing capabilities of the chip, data does not need to be uploaded to the cloud, fundamentally avoiding the privacy risks brought by cloud transmission and building a solid privacy defense line for users.

Nvidia, a leader in the fields of graphics processing and AI computing, has also achieved remarkable results in the layout of edge AI chips. Its Jetson series of edge AI chips are specially designed for edge devices such as robots, drones, and smart cameras. Taking the Jetson Xavier NX as an example, this chip integrates 512 NVIDIA CUDA cores and 64 Tensor Cores, with a computing power of up to 21 TOPS (trillion operations per second) but only requires 15W of power consumption. It can provide powerful visual recognition and decision-making execution support for robots in complex and changeable environments. In the logistics and, warehousing scenario, mobile robots equipped with the Jetson Xavier NX chip can quickly identify the location of goods and shelves, plan the optimal path, and efficiently complete the task of goods handling, greatly improving the efficiency of logistics operations.

Domestic enterprises have also achieved remarkable results in the field of edge AI chips. The DeepEdge 10 series launched by CloudWalk in 2022 is specially designed for edge large models. The upgraded DeepEdge200 in 2024 uses D2D Chiplet technology and is equipped with an IPU-X6000 accelerator card. It can be adapted to nearly 10 mainstream large models such as CloudWalk's TianShu and Alibaba Cloud's Tongyi Qianwen, and can achieve real-time recognition of abnormal behaviors in smart security cameras, shortening the early warning response time to within 0.5 seconds.

Main products of domestic AI computing power chip companies. Source: Minsheng Securities

On the evening of August 26, CloudWalk released its semi-annual report for 2025. The financial report shows that in the first half of 2025, it achieved an operating income of 646 million yuan, a year-on-year increase of 123.10%. The net profit attributable to the parent company was -206 million yuan, with a year-on-year reduction in losses of 104 million yuan. The non-recurring profit and loss net profit was -235 million yuan, with a year-on-year reduction in losses of 110 million yuan. Regarding the changes in performance, the company said that during the reporting period, the operating income increased compared with the same period of the previous year, mainly due to the increase in sales revenue from consumer and enterprise-level scenario businesses. The reduction in losses was mainly due to the simultaneous increase in operating income and gross profit margin during the reporting period.

Data source: Company financial report, compiled by Semiconductor Industry Review

Facing the reality of limited resources such as memory and computing power in edge devices, international technology giants such as Google, Microsoft, and Meta have all focused on the research and development and optimization of lightweight large models to enable large models to run efficiently on edge devices.

Google is actively exploring in this field. Through the delicate design of the model architecture and the fine adjustment of parameters, it has successfully carried out lightweight transformation on some large models. For example, the Gemini Nano model it launched is optimized based on the Transformer architecture. While maintaining relatively high model performance, it significantly reduces the number of model parameters and computational complexity, and can run smoothly on edge devices such as smart security cameras, providing strong support for real-time video image analysis. In the urban security monitoring network, cameras equipped with the Gemini Nano model can identify pedestrians and vehicles in real time, monitor abnormal behaviors, and issue timely alerts, effectively improving the urban security prevention and control capabilities.

Microsoft has taken a different approach. Although the phi-1.5 model it launched has a relatively small parameter scale, it is unique in the selection of model training data. This model is trained with carefully selected 27B token "textbook-level" data and performs excellently in mathematical reasoning ability, surpassing some large-scale models with hundreds of billions of parameters. In the intelligent tutoring system in the education field, the phi-1.5 model can quickly and accurately answer the mathematical questions raised by students, provide detailed problem-solving steps and ideas, assist teachers in teaching, and improve the quality and efficiency of teaching.

03

Where is the breakout point?

Smart home devices are one of the most common application scenarios for edge AI. It enables smart home devices to move away from "single-instruction execution" and towards "behavior prediction-based services". Smart thermostats learn users' schedules and sleep cycles and dynamically adjust the temperature in combination with outdoor weather, ensuring comfort while reducing energy consumption by 15% - 20%, far better than traditional devices. Terminals represented by the Xiaodu smart speaker can respond to high-frequency instructions within 0.3 seconds with the help of edge AI. They can also link cross-brand devices to form scenario-based services. For example, the "homecoming mode" automatically triggers turning on the lights, adjusting the temperature, and playing music, promoting the penetration rate of smart home scenario linkage in China to reach 38%, exceeding the global average.

Wearable devices are another important area for edge AI. The smart glasses jointly developed by Meta and Ray-Ban can achieve millisecond-level image recognition and local translation in cities such as Shanghai. They can convert road sign text in real time and recommend nearby stores even without a network, and the cumulative shipment volume has exceeded 2 million units. Chinese brands focus more on in-depth health management. The Huawei Watch GT series uses edge AI to integrate data such as heart rate, blood oxygen, and electrocardiogram, with an accuracy rate of up to 85% in screening for sleep apnea syndrome, helping more than 100,000 users detect health problems in advance. The OPPO bracelet adjusts the exercise intensity in real time based on users' exercise data and generates personalized plans, forming a closed loop of "collection - analysis - advice" for health management.

In the industrial field, the combination of AI, the Internet of Things, and robots is promoting the upgrade of factories from "single-device automation" to "full-process intelligent collaboration". Through edge AI to process production data in real time, the entire chain of "fault prediction, process optimization, and quality traceability" is made intelligent. Robots in smart factories are no longer just mechanical arms that "repeat single actions" but have become "intelligent production units" with "real-time decision-making capabilities". Arm's computing platform provides an "efficient data processing foundation" for the industrial Internet of Things. In the industrial scenario, a single smart device generates more than 10GB of sensor data (such as temperature, vibration, and pressure) every day. If all the data is uploaded to the cloud for processing, it will not only occupy a large amount of bandwidth but also cause data latency (which may reach several minutes). The edge computing capabilities of the Arm platform can achieve "local data filtering and analysis" - only "abnormal data" (such as