Samsung is preparing a major move: Its local large model will be integrated into the S26. Is the on-device AI battle in the mobile phone market about to start?
In the past year, mobile phone manufacturers have been increasingly active in the field of AI. Honor's YOYO has connected to more third - party intelligent agents, achieving the docking of AI capabilities between the system layer and the application layer. Huawei's Xiaoyi can shuttle between applications with just one command to help you complete some tasks.
Although these AI capabilities are becoming more and more powerful, if we break down these functions, we will find a very real situation. In essence, these AIs still need to be connected to the Internet to be used. That is to say, mobile phone AI is still stuck in the stage of end - cloud collaboration and has not made further progress.
Recently, the X platform Semi - retired - ing revealed that Samsung will prepare a large - scale model that can run locally on the upcoming Galaxy S26 series to implement most AI functions. This large - scale model even has advanced permissions and can clear the memory when necessary to free up more space to ensure that it can respond to user needs at any time.
(Image source: Oneleaks)
Actually, Samsung demonstrated a local large - scale model called "Gaussian" in 2023, and it was also pointed out that this model was pre - installed in the Galaxy S25 series. But for some unknown reason, Samsung has been strongly promoting Google's Gemini and has hardly mentioned "Gaussian" again. It was not until recently that Samsung's local large - scale model was mentioned again.
At a stage when most manufacturers still rely mainly on the cloud, why does Samsung try to put the model into the mobile phone? Is it hoping to "overtake on a curve" in this way? Or does the mobile end already have the ability to deploy large - scale models locally? Regardless of the answer, we only know that a new stage of mobile phone AI is about to begin.
Mobile phone manufacturers will not abandon end - cloud collaboration
If Samsung really wants to deploy the large - scale model locally, does it mean that mobile phone AI is going to abandon the end - cloud collaboration strategy and turn to pure local deployment? In fact, this may not be achieved in the short term.
End - cloud collaboration is an almost perfect solution in current mobile phone AI. The cloud undertakes the tasks of model scale, complex reasoning, and rapid iteration. The advantage behind it is that cloud servers have more sufficient computing resources, and it is also more convenient for model updates, unified management, and security reviews. The end - side is responsible for receiving the user's first command, such as wake - up, voice recognition, and basic intention judgment, and then transfers complex requests to the cloud to complete.
This division of labor logic is actually fine for users who use AI occasionally. When looking up a piece of information, even if it takes one or two more seconds, it will not significantly affect the experience. For manufacturers, this model will not take up more resources of the mobile phone, and even mobile phones with slightly poorer performance can use it. The strategy of pre - installing a large - scale model in the Samsung Galaxy S26 series is probably not going to be available for old models, which is the difference.
(Image source: Samsung)
But the problem is that the premise of this logic is that the frequency of AI use is not too high. As the development direction of mobile phone AI becomes clearer, manufacturers' goal is no longer just to "answer your questions" but to "complete operations for you". AI is no longer just a dialogue window. It begins to try to understand the screen content, break down task objectives, plan execution paths, and finally form a complete AI Agent link.
Once AI enters this high - frequency, continuous, and system - level interaction scenario, the shortcomings of end - cloud collaboration will be quickly magnified. For example, in a weak network environment, the delay in cloud response will cause obvious breaks in operations. In a continuous command scenario, a network interruption may stop the entire process. For users, low efficiency is difficult to accept.
That's why manufacturers have recently started to frequently discuss "end - side large - scale models". It doesn't mean that they are going to completely abandon the cloud. Instead, they hope to keep more instant judgments and key decisions on the device itself. End - cloud collaboration is obviously the optimal solution at this stage.
Where are the difficulties in implementing end - side large - scale models?
Since end - cloud collaboration has its drawbacks, why is it difficult to implement local large - scale models on mobile phones? Actually, it's not that manufacturers are unwilling to try, but the restrictive conditions are too clear.
Firstly, there are hardware constraints. Memory, computing power, and power consumption are the three core conditions for end - side AI. Even if the model scale is not exaggerated, as long as it needs to stay in the background all the time, it will continuously occupy system resources. Just the memory condition alone has even forced Apple to increase the memory space of the iPhone.
Secondly, there are issues of stability and maintenance costs. Cloud models can be quickly iterated and errors can be repaired immediately, while once a local model is deployed, the optimization rhythm can only rely on system updates. For system - level AI, this means higher risks and higher testing costs.
(Image source: Oneleaks)
But the change in 2025 is that the significant improvement in chip capabilities has made the pure end - side large - scale model on mobile phones almost a reality.
Take the fifth - generation Snapdragon 8 Extreme Edition as an example. Qualcomm has disclosed that its Hexagon NPU can achieve an output speed of about 200 tokens/s in local generative tasks. The significance of this indicator is that the end - side model is already capable of continuous and natural language generation. This continuity is a prerequisite for AI to execute complex interaction instructions.
Similarly, MediaTek's Dimensity 9500 has introduced a more aggressive energy - efficiency design in the NPU 990. According to the official statement, on the 3B - scale end - side model, while its generation efficiency has been improved, the overall power consumption has also been significantly reduced. This means that the end - side model is no longer just "able to run once" but begins to have a more realistic possibility of staying resident.
The new mobile phones equipped with the latest generation of flagship chips have more or less taken advantage of the benefits brought by the improvement of chip computing power and launched various AI interaction functions. For example, Honor's YOYO intelligent agent on the Magic 8 Pro can already support the automatic execution of more than 3000 scenarios.
But even so, it is still somewhat difficult to use pure end - side AI to achieve complex tasks.
Even the Galaxy S26, which is reported to have a built - in local large - scale model, needs to regularly clean up system resources to ensure that the model can run resident. This itself shows that it is still unrealistic to rely entirely on the end - side model to carry complex AI tasks in the short term.
End - side AI will not "overturn the table" but will become a watershed for flagship phones
Judging from the choices of current mainstream manufacturers, end - cloud collaboration is still the most reliable solution.
Taking Huawei as an example, Xiaoyi is still the most complete solution among domestic system - level AI assistants, covering multiple dimensions such as voice interaction, system control, and cross - device collaboration. But even so, its core architecture is still a typical end - cloud collaboration - the end - side is responsible for perception and basic understanding, and the cloud undertakes complex reasoning.
This is not that manufacturers "cannot achieve end - side implementation", but a more realistic trade - off issue. When AI begins to deeply intervene in the system and service layers, stability, efficiency, and resource control are always more important than radical deployment.
Meanwhile, the most notable change this year is that AI has begun to try to take over the "operation right". Doubao Mobile Assistant has tried to move the large - scale model capabilities forward to the mobile phone interaction layer, enabling AI not only to answer questions but also to directly understand the screen content, plan operation paths, and even simulate users to complete cross - app behaviors. This model has instantly excited the entire industry.
(Image source: Doubao Mobile Assistant)
However, a series of mobile phone AIs such as Doubao Mobile Assistant, Huawei Xiaoyi, Honor YOYO, and Xiaomi Super Xiaoai, which have started "autopilot", essentially represent a forward - looking direction. As mentioned above, this is a skill that AI mobile phones in the next stage must master.
In any case, the end - side large - scale model will not completely change the overall direction of mobile phone AI in a short time. Whether it is Samsung, Huawei, or several mainstream domestic manufacturers, the current choice is the end - cloud collaboration solution.
After all, mobile phones are not devices designed for large - scale models, so they must find a balance between performance, power consumption, stability, and security. Once AI begins to deeply intervene in system operations, user experience cannot be compromised, which is why manufacturers will not follow suit rashly.
From this perspective, the end - side large - scale model may not become a "highlight" at mobile phone press conferences, but it will quietly raise the technical threshold for flagship phones, creating a gap in the AI function implementation experience between AI mobile phones with end - side capabilities and those with only cloud - side capabilities. And this watershed may arrive in the near future.