AI glasses and AI phones: The hardware-software one-two punch of tech giants
Stop fixating on whose large model has more parameters. The real competition is just beginning.
Within a month, Alibaba launched six AI glasses in one go. Today, ByteDance directly integrated Doubao into the mobile phone system and stocked 500,000 new phone samples. This is not a mere side - project; it's a direct confrontation for the entry point.
No matter how powerful the model is, it's useless if users don't feel its power. When everyone starts talking about Agents and "direct intention fulfillment", you'll find that what AI really competes for is not "who answers more intelligently", but "who is more like your operating system".
One is the glasses that break free from the phone, and the other is the phone assistant that rewrites the phone experience. Essentially, they are both vying for the "ticket" to the next - generation human - machine interaction. Behind this is not just a change in the interaction mode, but also a shift in platform control.
After the competition in the cloud, it's time to decide the winner on the devices.
1. After the peak of large models, the entry point determines the outcome
In the first half of the AI wave, large models were the absolute protagonists. Whoever had more parameters, broader training data, and faster inference speed could seize the high - ground in the industry competition. However, in the second half of 2024, this model competition began to show signs of fatigue.
Not only did leading players like OpenAI and Anthropic postpone the release cycle of their next - generation models, but the ability gap among domestic leading large models also rapidly narrowed. The tug - of - war among Quark, Doubao, Wenxin Yiyan, and Tongyi Qianwen in terms of understanding ability has made users' perception increasingly blurred. Although the technological ceiling has not been reached, users' enthusiasm has stagnated. The model itself is no longer a decisive variable.
So, the focus has shifted - from the "strength" of the model itself to how to "put the model into use", and to humans.
However, humans don't use models directly; they use services through terminals. This means that whoever can control the contact points closer to users will have the initiative to convert model capabilities into service value. In the context of AI, these contact points are embedded hardware such as AI phones and AI glasses.
Jin Xian, the person - in - charge of Alibaba's intelligent terminal products, publicly pointed out the logic: "All the data trained by large models depends on the business data generated at the terminal. Many models collect data from usage scenarios such as mobile phones, tablets, and computers to serve these scenarios." That is to say, the terminal is not only the distribution terminal of the model but also its "feedback loop". Every user call, every interaction path, and every operation record are all reverse reinforcements of the model's capabilities.
Peng Deyu, a well - known technology industry commentator, told us: Furthermore, when AI enters the "Agent stage", this trend becomes even more prominent. The traditional "question - answer" Chatbot logic is no longer sufficient. The new user expectation is "say a word, and it helps me get things done", which means that AI not only needs to understand language but also intervene in the actual task - chain execution.
Take the newly released Doubao mobile phone assistant as an example. When a user says "Help me write a positive review for last week's order on Meituan", it needs to cross multiple apps, identify page elements, and simulate click paths to complete a full task chain. Without sufficient operating system permissions and the screen - understanding ability of multi - modal large models, this is almost impossible to achieve.
And such capabilities precisely require the terminal as the implementation scenario.
The value of the terminal lies not only in "interaction efficiency" but also in "ecosystem dominance". For large companies, which device users use, on which system they perform tasks, and who has the permission to call the entry point determine the foundation of the future platform pattern.
OpenAI's acquisition of the hardware company IO founded by former Apple chief design officer Jony Ive for nearly $6.5 billion in May this year is regarded as a strategic signal of going all - in on Agent hardware; Google's Gemini team is collaborating with Samsung to promote terminal - side deployment; domestic companies like Xiaomi, Li Auto, Alibaba, and ByteDance are also involved in terminal form transformation in different ways.
This is not just enthusiasm for "manufacturing hardware", but anxiety about "not losing the entry point".
If GPT pulled people into the threshold of the AI era, then starting from 2025, the door for AI to truly enter users' lives may not be in the cloud, but in the pair of glasses in front of you or the mobile phone in your hand.
2. Two paths, one goal: Competing for the next - generation entry point
Although both are involved in the AI hardware track, Alibaba and ByteDance's approaches are almost opposite.
Alibaba chose to create a new species from scratch - AI glasses. The six Quark AI glasses launched on November 27, in my opinion, almost all have an "function - first" engineering - machine style. They don't focus on fashion or compromise on form, and directly aim for practicality. Their mission is not to impress ordinary consumers but to run through the logic of "perceptual human - machine interaction".
In Alibaba's view, AI glasses are the next - generation "personal mobile entry point". They are not accessories for mobile phones but gradually replace the mobile - phone scenarios. Song Gang, the person - in - charge of Alibaba's intelligent terminal business, clearly stated at the press conference: "It is the device most likely to challenge the mobile phone in the future." This is not just marketing talk but a complete re - evaluation of interaction.
In the mobile - phone era, users need to "download an app - open it - search - operate" to complete tasks. AI glasses hope that users only need to say a word, such as "Help me take a photo and upload it to Weibo", and the AI can call the camera, recognize the scene, and publish the content. The underlying logic is no longer the app but the Agent: an interaction hub that can understand intentions and actively execute.
Behind this is Alibaba's typical thinking of coordinating cloud models and terminals. For large models to iterate in the future, they must be "fed" with business data collected at the terminal. Only by making its own hardware can it have sufficient permissions to connect the entire process of data collection, system call, and user interaction.
In contrast, ByteDance chose an almost completely opposite path: it doesn't manufacture mobile phones but wants to "re - make the mobile - phone system".
The engineering prototype nubia M153 phone, a collaboration between Doubao and ZTE launched on December 1, is not really new hardware. Its core selling point is the "Doubao mobile phone assistant" - an AI Agent embedded in the operating system with the ability to execute a complete task chain. It can understand the screen interface, simulate clicks, and jump across apps to achieve "direct intention to service".
Different from the shallow - level instruction execution of traditional voice assistants, the Doubao assistant goes deep into the operating - system level. Through multi - modal large models, it understands the graphical interface and achieves the ability to "complete complex tasks within the virtual screen". For example, when a user says "I'm going to Paris next month. Help me mark the favorite restaurants on the map", Doubao can break it down into six steps, including extracting from social media, marking on Gaode Map, booking tickets on Ctrip, and organizing in the memo, and execute them like a human.
This is actually "re - constructing the main - control logic of the mobile - phone operating system", making AI the "first entry point" of the system rather than a function in an app.
ByteDance chose a more flexible strategy: collaborating with mobile - phone manufacturers and deeply embedding software capabilities into the device ecosystem. According to GeekPark, citing a former ZTE product manager, the initial sales stock of the nubia M153 is as high as 500,000 units. For a system - level pre - installation project of an AI assistant, this is a very aggressive figure.
This is not ByteDance's first foray into hardware. As early as 2018, it acquired the Hammer team to enter the mobile - phone ecosystem; in 2021, it merged with PICO to enter the VR field; in early 2024, it acquired Oladance to enter the AI earphone market... Now, all these hardware resources have been integrated into ByteDance's "Ocean Department", led by Liu Chengcheng, the founder of 36Kr, reporting to Zhu Jun, the person - in - charge of Flow. Organizationally, this is one of ByteDance's rare strategic - level department configurations.
Alibaba is creating a new entry - point device, while ByteDance is transforming the existing entry - point system. The former uses "device + scenario" to subvert the app logic, and the latter uses "system + model" to rewrite the interaction protocol. But the goal is the same - whoever can take the initiative at the terminal may have the next ecosystem - level entry point in the AI platform era.
No matter how different the paths are, this time both Internet giants have given the same answer: the main battlefield of the AI era is shifting to the device side.
3. Bubble or starting point? The reality and uncertainty of AI hardware
AI hardware sounds like the next "hot spot", but the actual implementation is much more complicated than expected.
Let's first look at the Doubao AI phone. Although the initial sales stock is 500,000 units, which is a significant investment in a manufacturer of ZTE's scale, it is still significantly different from the shipment volume of mainstream flagship phones, which can easily reach 2 - 3 million units. Moreover, its price is as high as 3,499 yuan, which essentially targets developers and geek users rather than the mass market. This product is more like a "technology - verification entry point": used to test the implementation experience of the AI assistant, polish the system - call logic, and accumulate templates for system - permission cooperation, rather than a real consumer electronic product.
However, even in the "preview version", the technical uncertainties exposed by the Doubao assistant are not insignificant. Whether it's the stability of "task - chain execution", the accuracy of "screen recognition", or the exception handling, false - touch judgment, and safety fault - tolerance when performing tasks across multiple apps, AI control at the system level is essentially a reconstruction of the operating - system architecture. And any bug could cause a disaster for the user experience.
The official documentation also clearly states that the current "operating the phone" function is still in the technical preview stage and is still far from large - scale stable implementation. This state of being pulled between "fantasy" and "reality" also reflects that AI Agents are still in the polishing stage at present.
The same is true for Alibaba's AI glasses. Although launching six products at once shows a high - level strategic bet, currently, there is almost no clear market foundation for this type of device in China. In terms of product form, the Quark AI glasses follow a minimalist route of "perception - driven + Agent control", aiming for "ready - to - use after power - on and interaction through dialogue". Logically, they have the potential to subvert mobile phones, but the technical conditions are not yet mature.
Especially at present, AI glasses still face significant bottlenecks in sensors, battery life, and computing - power integration. To truly achieve "environment recognition + intention understanding + action execution", the device needs to have stable multi - modal inference capabilities and complete scene - modeling capabilities at least. This is still a high - threshold proposition in 2025.
A more realistic question is whether users are really ready to hand over the "interaction right" to AI?
The Doubao assistant already has the ability to "automatically operate" in the background, bypassing users' active clicks to achieve a closed - loop task chain. However, this also raises another question: how to ensure data permissions, personal privacy, and payment security? In the official demonstration, although the payment process still retains the manual - confirmation mechanism, the ability of the AI Agent to bypass apps and directly simulate interaction operations still has the risk of being misused. Especially in the stage where the security boundary has not been established and the system - permission standards are not unified, such AI products with "exceeding - the - norm capabilities" may become a regulatory gray area.
Nevertheless, this wave of AI - hardware fever is not a bubble.
On the contrary, it is an inevitable stage in the evolution of large - model platforms. When Chatbots are no longer new, the growth rate of app users slows down, and the model's capabilities are difficult to perceive, only by reconstructing the interaction form can AI reshape its "user - value perception interface".
Hardware is not the end but a platform - level reconstruction of "connecting the entry point - calling the system - collecting data - feeding back to the model".
Currently, Google's AI - glasses project has entered the POC stage; Xiaomi and Li Auto are frequently testing the waters with AI glasses and in - car AI assistants respectively; OpenAI acquired IO to build Agent hardware devices; ByteDance is testing the full - link system integration through the Doubao assistant; Alibaba is betting on the glasses form to challenge the dominance of mobile phones - globally, technology companies are launching a new round of layout around the "platform - level AI entry point".
This is not just a hardware - update war but a signal to start a new platform cycle.
This article is from the WeChat official account "High - level Insights and Trends". Author: Gao Heng. Republished by 36Kr with permission.