HomeArticle

2026 "On-Device AI Battle" Escalates: What Are Apple, Google and Others Competing For?

36氪的朋友们2026-06-22 20:05
In 2026, end-side large models will evolve from being operable to delivering excellent practical performance, and the collaboration between software and hardware will become the key driving force.

In the first half of 2026, edge-side large models have entered a new phase: the models will continue to become smaller and lighter, but compression alone is no longer sufficient. The key going forward is to integrate the models with the underlying frameworks, chips, and specific device scenarios, moving from simply "being able to run" to "being more user - friendly".

This transformation is concentrated among leading manufacturers. At the Apple Worldwide Developers Conference (WWDC2026) on June 9th, Apple made efforts in edge - side large models and released the AFM3 series.

Apple's approach is to fundamentally design a structure that saves computing power for the edge side. It launched the edge - side main model AFM 3 Core Advanced with approximately 20 billion parameters. By increasing the size of the model itself and using a sparse architecture, only a part of the parameters is activated during each inference.

This is Apple's solution. However, from an industry perspective, there are still different choices and divergences in the implementation path on the edge side.

Some companies follow the "distillation route": transfer the capabilities of a powerful large model to a smaller - sized model, and then run this small model on the device to approximate the effects of advanced large models at a lower cost.

Google's Gemini Nano is a typical example. The early Gemini technical report mentioned that Gemini Nano is distilled from a larger Gemini model, is designed for edge - side deployment, and runs directly on Android phones such as Pixel and Samsung Galaxy.

Another type of manufacturers choose to redesign the model itself based on edge - side constraints. Under the limited conditions of computing power, memory, and power consumption, they aim to maximize the ability density that each parameter of the edge - side model can carry.

This route bets on the "small and powerful" edge - side large models: the model volume must be small enough to be deployed in more terminals such as mobile phones, PCs, in - vehicle infotainment systems, and robots; at the same time, the capabilities must be comprehensive enough to support edge - side agents, real - time interactions, and local intelligent experiences.

Take Mianbi Intelligence, a domestic company focusing on edge - side large models, as an example. It has long emphasized model compression and the improvement of ability density. Starting from the MiniCPM series, it hopes to carry stronger model capabilities with a smaller parameter scale.

So far, Mianbi Intelligence has chosen to continue compressing the model along the low - bit route. In collaboration with Tsinghua University and the OpenBMB open - source community, it released BitCPM - CANN, which has verified the training scheme of a 1.58 - bit ternary large model on the Huawei Ascend platform.

Mianbi Intelligence's idea is that in the past, each parameter of a large model usually occupied more storage space and computing resources in the computer. Now, it can be represented with only a few bits, so the model can save more computing power and storage.

Moreover, this step of compression has made the upgrade of edge - side large models not only stay at the model algorithm level but also start to enter the chip adaptation level.

At a recent communication meeting with Mianbi Intelligence, Li Dahai, the CEO of Mianbi Intelligence, said: "Since this year, as the industry has shifted its inference to domestic chips, we have also gradually transferred our training work to domestic chips and domestic clusters."

This also points to a common trend of edge - side large models: The closer the model gets to the terminal, the more it relies on the cooperation between hardware and software. Simply making the model smaller is not enough. The model needs to fit the computing method of the chip, and the chip also needs to be further optimized for large - model inference.

In the industry, similar actions are becoming more and more common. Whether it is Apple's launch of Core AI around the Apple Silicon chip or manufacturers such as Qualcomm, MediaTek, and Intel, they are all building their own edge - side AI platforms.

The competition in edge - side AI is shifting from comparing parameter scales and compression ratios to the overall cooperation among models, chips, systems, and applications.

However, while a consensus on edge - side large models is being formed, differences are also emerging.

After the models truly enter real devices such as mobile phones, cars, PCs, and robots, the focus of industry discussions has begun to be more concentrated on the expansion and boundaries of the core capabilities of edge - side models: What core tasks should edge - side models undertake? How should the division of labor between local intelligence and cloud intelligence be carried out? What thresholds does an edge - side model need to cross to move from "being able to run" to "being user - friendly"?

Regarding these questions, Li Dahai, the CEO of Mianbi Intelligence, shared his judgments and thoughts after edge - side large models enter the implementation stage.

01. Apple's Increased Investment in the Edge Side: A Delayed "Systematic Project"

Question: In 2026, Apple continued to increase its investment in edge - side large models and launched the edge - side large model AFM3 Core Advanced, which once again made edge - side AI the focus of the industry. What do you think of the implementation progress of Apple's route? How do you view Apple's approach of entering the edge side through the "sparse route"? What kind of competitive pressure will it bring to Android phone manufacturers?

Li Dahai: I think it can be viewed from several perspectives.

First, Apple's cloud - edge collaboration strategy was actually announced in June 2024, and its gradual implementation by now is, to some extent, later than the industry's expectations. This shows that edge - side large models are not just a simple model problem but a systematic project involving chips, systems, software ecosystems, and specific scenario definitions.

Second, Apple's entry into edge - side large models further demonstrates that this direction is valid. The value of edge - side models is not just about putting a small model into a mobile phone but about truly changing the way people interact with devices. Mobile phones are the most frequently used terminals by users and are closest to personal data and personal scenarios, so they are very suitable for carrying some high - frequency, real - time, and privacy - sensitive intelligent capabilities.

Third, this is not entirely a competition between Apple and Android. The key lies not in the operating system camp but in who can find more suitable chips, more efficient models, and clearer product scenario definitions.

Actually, domestic mobile phone manufacturers have been paying attention to this direction for a long time and have been conducting in - depth cooperation with model companies and chip companies. From my observation, everyone has a fairly in - depth understanding of edge - side intelligence, and the gap is not as large as the outside world imagines. From Mianbi's perspective, we proposed an edge - side strategy in 2024 and have been continuously cooperating with domestic terminal manufacturers.

Question: Apple is increasing its investment in edge - side large models, and high - end Android phone manufacturers are also looking for their own edge - side AI routes. What capabilities are the key to truly creating an experience difference for edge - side large models?

Li Dahai: Based on Mianbi Intelligence's experience, mobile phone manufacturers usually consider several very specific issues when evaluating edge - side models.

First is the model's own capabilities and deployment costs. For edge - side models, one cannot simply look at the parameter size or the score on a certain list. It ultimately has to run on devices like mobile phones, so one must consider capabilities, speed, power consumption, and memory usage simultaneously. If the model is too weak, users won't perceive its value; if it is too heavy, it will cause problems such as high power consumption, overheating, and unstable experiences.

Second is the adaptation ability with edge - side chips. The AI capabilities in mobile phones ultimately need to run on chips. Model companies cannot simply make adaptations after the hardware is determined. The ideal way is to work with chip manufacturers at an earlier stage to consider the model structure, inference method, memory usage, and power consumption performance. For example, Mianbi has cooperated with some edge - side chip manufacturers including Qualcomm and will also conduct more pre - emptive joint optimizations in some directions.

Third is the inference efficiency. Terminal devices such as mobile phones and cars have high requirements for power consumption and stability. Users won't accept an AI function that seems powerful but obviously consumes a lot of power, overheats, or has an unstable response when used. So, when the effects are similar, the one who can provide the experience with lower power consumption and lower latency has an advantage.

Apple's entry into edge - side large models will accelerate the maturity of the entire ecosystem. For high - end Android phones, the pressure will increase, but opportunities still exist. In the future, what truly determines competitiveness is whether chips, models, systems, and scenarios can form an efficient synergy. Whoever can connect these links will have a better chance of turning edge - side AI into an experience that users can truly perceive.

02. Bottlenecks in Edge - Side Implementation: The Combination of Models and Chips

Question: After entering 2026, what stage has the implementation of edge - side models reached? What are the key bottlenecks currently restricting the further large - scale application of edge - side models?

Li Dahai: In 2025, Mianbi Intelligence's edge - side model was mass - produced in the automotive scenario, which is a very important milestone; this year is the second year of implementation, and the growth rate of edge - side models is actually very fast.

However, the biggest restriction for the real - world implementation of edge - side models is still what was just mentioned - the combination of models and chips.

Edge - side scenarios are different from cloud scenarios. They have high requirements for power consumption, computing power, bandwidth, cost, and real - time performance. The model's capabilities themselves are very important, but without the support of suitable edge - side AI chips, many capabilities are difficult to enter real devices in a low - cost and low - power - consumption manner.

So, we are very much looking forward to the mass production of a new batch of domestic edge - side AI chips with integrated storage and computing. Currently, some relevant chips are in the tape - out stage. Once they are applied on a large scale, they are expected to provide more competitive edge - side AI capabilities in terms of power consumption, computing power, and bandwidth. Based on these chips, edge - side applications will explode more rapidly.

In addition, we believe that the most reasonable form of edge - side AI is not to put all capabilities on the edge or to rely entirely on the cloud but to achieve cloud - edge collaboration.

For example, context management should be placed on the edge as much as possible. Some important, high - frequency inference tasks with higher requirements for privacy and real - time performance should also be completed on the edge first; while more complex and resource - intensive tasks can be handed over to the cloud.

In this mode, edge - side models will more naturally enter users' daily lives. It may not appear as a very obvious "large - model product" at the beginning but will be embedded in specific scenarios such as cars, mobile phones, PCs, wearable devices, and smart homes, becoming an intelligent experience that users can directly feel. As chips, models, and application ecosystems become more mature, the implementation speed of edge - side models will significantly accelerate, and we will also see a large number of practical applications this year.

Question: In the past, domestic AI chips were more used for inference, but large - model training has higher requirements for software stacks, cluster stability, communication efficiency, and precision consistency. From the perspective of a model company, what difficulties does Mianbi Intelligence need to overcome when migrating training tasks to domestic chips?

Li Dahai: We are currently advancing in two main directions.

The first direction is to continuously collaborate with domestic chip manufacturers in real - world training tasks. Model companies will encounter many specific problems during the training process, such as operator performance, communication efficiency, cluster stability, and precision alignment. These problems will only be fully exposed in real - world large - model training. Through continuous feedback, optimization, and verification, model companies and chip companies can jointly polish the domestic AI software ecosystem to make it more mature.

The second direction is to do some cooperation in more underlying software adaptation. The problem with domestic chips is not just the performance of a single chip. The bigger challenge is that the software stacks are not unified. Different chips have different compilation, operator, communication, and scheduling systems. If a model company has to re - adapt every time it connects to a new type of chip, the cost will be high, and the efficiency will be low.

So, we will also participate in some work on the common software ecosystem, such as FlagOS led by the Beijing Academy of Artificial Intelligence. Its significance is to hope to precipitate some of the repetitive adaptation work, so that different domestic chips can have clearer interfaces and collaboration methods during model training and inference. This work is very valuable for the domestic intelligent computing ecosystem and is developing rapidly.

Mianbi Intelligence is not only a large - model company but also has relatively deep accumulations in operator adaptation and underlying optimization, so we are more involved in both of these paths. On the one hand, we help domestic chips and software stacks discover and solve problems through real - world model training tasks; on the other hand, we also participate in the more systematic construction of the domestic AI software ecosystem.

In addition, migrating training to domestic chips is more complex than migrating inference. Inference mainly focuses on throughput, latency, and cost, while training also needs to verify numerical precision, stability, and long - term operation ability.

For this reason, we use small - model experiments to predict the training effects of large models and align the test results on domestic AI chips such as Huawei's with those on the NVIDIA platform to determine whether the training precision is reliable. Such tests can expose the underlying problems in chips, operators, and software stacks in advance before large - scale training.

Question: In the first half of 2026, products such as "Doubao Phones" have attracted the outside world's attention to edge - side intelligent agents. What do you think of the changes brought by edge - side models and edge - side intelligent agents to the human - machine interaction mode?

Li Dahai: This is a very natural development direction.

This is determined by the division - of - labor advantages of edge - side models. Compared with relying entirely on the cloud, edge - side models have more advantages in privacy protection, real - time response, and reliability, so they are naturally suitable for undertaking human - machine interaction tasks. Because the interaction between humans and devices has very high requirements for real - time performance and stability.

We can use cloud gaming as an analogy. In the era of mobile Internet, many companies have tried cloud gaming. In theory, cloud gaming places rendering on the cloud, so the terminal does not need strong computing power, but this direction has never really been successful on a large scale. The core reason is that users are very sensitive to the frame rate, latency, and stability of the interaction and do not want to experience sudden freezes without warning.

That is to say, many people underestimate the requirements of interaction experience for real - time performance and reliability. Only on the terminal side is it more likely to meet such high standards. In fact, as early as the PC Internet era, we have already seen the importance of this. The first company I worked for was Google. At that time, Google soon discovered that every 100 - millisecond improvement in response speed would have a significant impact on the advertising conversion rate.

So, when it comes to products like Doubao Phones, the real thing worth paying attention to in the combination of edge - side models and edge - side intelligent agents is the possible new interaction layer it may bring.

Whether an edge - side intelligent agent can be well - developed does not only depend on the strength of the model's capabilities but on the superposition of three factors: First, how much cost the chip and computing power can bear; Second, the comprehensive performance of the model in terms of capabilities, speed, power consumption, and stability; Third, whether the specific scenario is feasible. Only when these three circles truly overlap can edge - side intelligent agents enter large - scale applications.

The chip determines whether it can run, the model determines whether it can do the job, and the scenario determines whether anyone will use it. Only when these three points overlap will edge - side intelligent agents move towards large - scale applications.

03. After the Implementation of Agents, More Tasks Will Return to the Edge Side

Question: Mianbi Intelligence has explored low - bit quantization to 1.58 bits. What do you think of the space for further compression of model quantization? What are the main directions for future breakthroughs?

Li Dahai: Based on the current technological judgment, 1.58 bits may be close to the limit of model quantization. Further compression will have less theoretical space, and the real challenge is not just to reduce the number of bits but to maintain the model's capabilities without significant loss under an extremely high compression ratio.

For us, the more crucial thing is whether the quantization loss can be low enough. Model compression is not simply about pursuing smaller parameter occupation. More importantly, on the premise of lower storage, lower computing power, and lower power consumption, it is necessary to still maintain good enough inference effects. This is also one of the most important issues when edge - side models are truly implemented.

In this regard, Mianbi Intelligence adopts a route of optimizing for low - bit quantization from the training stage, that is, through QAT (Quantization - Aware Training), to make the model continuously adapt to low - bit representation from the beginning of training, rather than performing post - processing compression after the model training is completed.

The advantage of this method is that the model is optimized around the