StartseiteArtikel

Li Auto has also ventured into AI glasses. Should XPeng, NIO, and Xiaomi follow suit soon?

X研究媛2025-12-08 19:58
From 1 to 100 has always been the forte of Chinese enterprises.

The entire industry has transformed from being ignored to bustling with activity, as if a lifetime has passed.

The technical difficulty of AI glasses with displays remains too high. There are no technical barriers for AI audio and shooting glasses. With Meta Rayban's sales of over 2 million as an example, the Product-Market Fit (PMF) has been verified.

Scaling up from 1 to 100 has always been a forte of Chinese enterprises. The motivation behind the Ideal Livis project is unknown to outsiders, but there are many enterprises eager to enter the emerging trend, queuing up one by one.

Where is the potential of AI glasses?

Twenty years ago, it was hard to imagine going out without a smartphone. The development history of smartphones demonstrates a "paradigm shift" in life and work as technology advances and new product categories penetrate. In the long run, once a "paradigm shift" starts, it is almost inevitable, and everyone is forced to be involved.

After experiencing the AI glasses on the market, some noisy KOLs like Li Nan judged them as "useless" and "pseudo-demand". Their voices are loud, but their insights seem shallow. Why do users need a pair of AI glasses? Thirty years ago, you could also question why users needed a smartphone.

People wear glasses because they are nearsighted and can't see clearly. Mobile phones were initially developed to meet the need for mobile communication. From the starting point of the product, the "rigid demand" for glasses exceeds that for mobile phones.

Glasses have existed for over a thousand years, while the development from mobile phones to smartphones has been less than a century. Smartphones created a new product category and completely changed users' habits. In contrast, adding advanced functions to glasses that users are already accustomed to is obviously less difficult for AI glasses. If smartphones can become the mainstream consumer electronics, why can't AI + AR glasses?

AI glasses actually have great potential to initiate a new round of "paradigm shift".

The insight shared at the Ideal Livis press conference about why Ideal is making AI glasses is very profound: "From inside the car to outside, what form can provide a natural, non - intrusive, and long - lasting companionship? Glasses. Glasses are a terminal form that is worn for a very long time every day, has a low presence, and requires a very high level of comfort. They are very close to voice input, have good stability, and do not require users to change their existing habits. They are the best carrier to bring our intelligent experience outside the car."

The first characteristic of glasses is that the daily usage frequency and the user's Always on (real - time online) duration exceed those of mobile phones. They are naturally candidates for the next - generation consumer electronics.

AI + AR glasses also have a second special feature: they completely solve the contradiction between a larger interaction interface and mobility.

From the giant IBM computers that filled a room to personal computers, after the emergence of desktop PCs, laptops, tablets, and smartphones were developed. The driving force behind this is that users not only need a larger display and interaction interface but also higher mobility and portability. Before the emergence of AI + AR glasses, this problem could not be solved.

The BirdBath glasses from XREAL, Thunderbird, Rokid, and Viture have been verified in large - screen gaming and movie - watching. The image light emitted by the micro - display is designed through a geometric optical path. The virtual image is focused and formed in a fixed area in front of the eyes, and we can see a high - definition picture equivalent to a 120 - inch screen. However, the light transmittance of BirdBath glasses is only 25%, and they cannot be worn all - day and in all scenarios, so their usage scenarios are limited. Meta Orion's two - dimensional pupil - expanding waveguide glasses using a silicon carbide waveguide substrate regardless of cost can achieve a field of view of 70 degrees, with a wider frame and a light transmittance of over 90%, breaking through the critical point for daily use.

Moreover, the spatial photos and videos of AI + AR glasses with binocular vision have depth perception and are more vivid, which is completely impossible for a flat physical display. Currently, the mainstream content in consumer electronics is still "customized" based on 2D planes. This does not mean that users do not need higher - order 3D content. According to Jiang Gonglue, the founder of VITURE, when the interaction bandwidth of our glasses can reach the upper limit of human senses, we can reshape the real world with digital technology.

Jiang Gonglue gave a real - life example: After VITURE launched its Immersive 3D function, many touching posts were received. One user posted that his father had just passed away. When he was sorting through his father's relics, he found an old hard drive filled with photos and videos of his childhood with his father. Every time he put on the VITURE glasses and used the 3D conversion technology to view them, he would shed tears, as if his young father was right in front of him.

AI + AR glasses also have a third key variable: the development of large models and the penetration of AI Agents.

OpenAI's ChatGPT has been the product with the fastest user growth in Internet history. The monthly visits to the OpenAI platform increased from about 19 million in 2022 to about 5.9 billion in September 2025, making it one of the top five most - visited websites globally, on the same scale as Instagram's 6.5 billion monthly visits. Data disclosed by Similarweb shows that the monthly global visits to AI services reached about 7 billion in September 2025, and the traffic volume is comparable to that of mainstream social networks.

When everyone is involved in the "paradigm shift" of AI in work and life, and the number of times, frequency, and duration of an ordinary person's interaction with AI break through the critical point and reach a stable state, the characteristics of glasses as an AI terminal will become more and more prominent: there is no need to educate users or create new user habits. Glasses have a long - lasting Always on real - time online state, as well as shorter - chain operations, responses, and interactions. Hands - Free AI glasses mean faster instant responses, eliminating the need to take out the phone, unlock it, click on the app, and interact.

Ideal Livis showed several data at the press conference. Controlling the car without taking out the phone provides a very smooth experience. Fan Haoyu, the senior vice - president of the product department of Ideal Auto, described: "Since I got the car - control function of the glasses, I haven't taken out my phone from my pocket, unlocked the screen, opened the app, clicked the button, and waited for the car to start for a long time. All these actions together take 7 to 8 seconds. With the Livis glasses, it only takes one sentence."

The replacement of mobile phones by glasses must occur as the usage of glasses increases and the usage time of mobile phones decreases.

Glasses are a first - person perspective and Hands - Free, and can become an extension of the human body's organs. Smartphones, on the other hand, are tools that require additional effort and adaptation. The special feature of glasses is that they can obtain multi - modal data from the real three - dimensional space in front of the eyes in real - time. The voice, pictures, video streams, eye and head movement data, and preferences for the content being stared at that users "produce" anytime and anywhere. This kind of personalized data collected and accumulated all day long is not only of great significance for training a personal - exclusive AI Agent but also has huge commercial potential.

AI and glasses are naturally compatible and resonate deeply.

Some insights of Ideal Livis in the vehicle scenario for AI audio and shooting glasses

"We don't want to rush to make a product just to make quick money or follow a trend. Instead, we really want to make a highly available product that doesn't burden users and can accompany them for a long time. We want users to truly feel that their work and life have become better because of it."

Ideal shared its "ideal" for making AI glasses. The Livis glasses do offer some insights in the vehicle scenario:

  • Transfer car control from the phone to glasses for faster operation and hands - free experience
  • Photochromic technology to enhance HUD display function
  • Streamed intelligent voice framework
  • Multi - modality and time - stream memory ability

Ideal has realized that glasses are a better car - control tool than mobile phones.

"One sentence can control the car." Inside the car, the microphone on the glasses on the nose bridge is closer to the user than the car - machine microphone, so voice recognition and voice commands are clearer. Moreover, the speakers on the glasses can play sound directionally without interfering with the in - car audio - visual playback.

Outside the car, with the Livis glasses, users don't need to take out the phone, unlock it, click on the app, and then control the car. The car - control process is very smooth and natural, saving the few seconds of the rather cumbersome phone app operation. Users can free their hands to pick up items or luggage.

The "adaptive" photochromic technology can also meet the real needs of drivers. Many drivers like to wear sunglasses when driving long - distance to avoid strong sunlight during the day and high - beam lights at night. HUD has also become a standard feature in smart cars. The Ideal Livis glasses can ensure clear vision, enhance HUD information display, and deal with strong and dazzling lights that may cause temporary blindness during driving.

Ideal Livis has indeed polished AI more than ordinary car manufacturers. The Livis team has applied a "real - time streamed" multi - modal model, a technology area that only large - model companies are deeply familiar with. It's impressive that a car manufacturer has absorbed and applied it.

Traditional voice interaction waits for users to finish asking a question, accumulates a pile of text, and then generates an answer. AI has to follow a one - question - one - answer, fixed - rhythm input - output mode. In contrast, real - time streaming is the closest to natural human - to - human interaction and is quite difficult: the model can answer while being asked and can be interrupted at any time. The model perceives, understands, and thinks about the answer while receiving the user's question, and may even start understanding voice, pictures, and video streams before the question begins.

Real - time streaming means that the model's thinking and interaction occur simultaneously, without one side waiting for the other, just like continuous thinking rather than discrete question - answer sessions. The end - side full - modal model released by Mianbi Intelligence at the beginning of this year well explains what "real - time streaming" is.

While the MiniCPM - o 2.6 model is thinking about and generating a reply, it can also receive new voice prompts in real - time and conduct multi - modal perception. Users can also interrupt the model's current generation at any time. While generating the answer to your previous question, the model also receives new input inserted during the interruption and thinks about a new answer.

The difficulty of real - time streaming lies in ensuring that the user's new voice questions do not affect the model's ongoing understanding and generation, and that the model's input and output can be decoupled. The key behind this is that the model performs real - time frame extraction and modeling of environmental video and audio streams, and conducts this multi - modal understanding synchronously or in advance when you ask a question.

Ideal Livis claims to be the only hardware - software integrated product in the world that uses a streamed intelligent voice framework at present. We won't comment on this PR statement, but this car manufacturer's in - depth exploration of large models exceeds my expectations.

At the press conference, Ideal announced that Livis is customized with the Hengxuan 2800 chip for wearables. It not only provides low power consumption and long battery life but also offers very fast response speed. Voice wake - up only takes 300 milliseconds. The full - link optimization of Livis glasses from hardware to software to the model is all for faster and more natural voice conversations.

More fluent and faster - responding voice interaction is very important for AI glasses. Although full - link optimization is very difficult, quantitative changes can lead to qualitative changes.

Ideal's in - depth exploration of AI also shows its ambition in another aspect - multi - modality and time - stream memory ability.

Ideal claims to have self - developed the MindGPT. However, based on my experience in large - model companies, from data cleaning, ratio adjustment, data set collection, to algorithm design and pre - training with a large number of computing power cards interconnected, the process is time - consuming and costly. There are too many details in pre - training, and there are challenges in hardware - software integration. It's not easy to find reliable people in AI Infra. During pre - training, the model often crashes due to hardware and data communication problems, either the parameters cannot converge, or the model has poor generalization ability.

After pre - training, there are also post - training of reinforcement learning, emergence of thinking chains, and enhancement of mathematical reasoning ability. Deep Research, which involves multi - step cyclic reasoning, parallel invocation of composite tool chains including search, integration of multiple models, and unification of data formats at the underlying architecture to train an end - to - end AI Agent, is completely beyond the scope of a car manufacturer.

Ideal will encounter fundamental difficulties in achieving the user's multi - modal content time - stream memory ability at this stage. The algorithm complexity of the Attention mechanism increases quadratically with the input context. Even large - model companies are still trying to solve the long - term memory problem, but Ideal wants to give it a try.

Those who understand large models know that once a model is trained, its knowledge is fixed, and the parameters generally do not change. Using an external RAG for so - called personalized generation only treats the symptoms, not the root cause. Relying on AI glasses to accumulate user data, using car - control accessories and customized models to enhance user stickiness, and creating a seamless, real - time, all - day high - level intelligent experience from inside the car to outside is still overly optimistic and "idealistic" in terms of technology.

Ideal even announced the scores of the large - model evaluation set at the press conference. In the industry, "scoring" through reinforcement learning on a data set is actually meaningless cheating.