HomeArticle

Give a real "face" to the large model. Four AI + hardware entrepreneurs look forward to 2025.

王方玉2025-01-13 11:24
From "Using AI" to "Using AI Well", how many mountains do hardware manufacturers still have to cross?

Interview | Su Jianxun

Text | Wang Fangyu

Editor | Su Jianxun

If in CES 2024, AI was presented more as an independent highlight by a few enterprises; in this year's CES 2025, the integration of AI and the consumer electronics industry is more extensive and in-depth, just like the theme of this CES, "Dive In".

Take smart glasses products as an example. At this CES, from AR manufacturers such as Rokid, Thunderbird, Xreal, and INMO, to cross-border players such as Xingji Meizu, Thunderobot Technology, and DPVR, and then to startup companies such as Haliday and Vuzix, Chinese manufacturers staged an "AI Hundred Glasses Battle" in the United States thousands of miles away.

At the CES exhibition site, "Intelligent Emergence" found that from stringless guitars, AI facial masks, to rings, crutches, AI bicycles... Various hardware products in life have become new "faces" for the landing terminals of AI large models. Not to mention various consumer electronics that have long embraced large models - from AI glasses, AI headphones to AI companion robots, and then to AI PCs, mobile phones and learning machines, etc.

The theme of this CES "DIVE IN"; Photographed by: Su Jianxun

AI is everywhere in this "Spring Festival Gala" of the consumer electronics industry; but behind the enthusiastic appearance, the AI hardware industry needs more in-depth "cool thinking":

From "using AI" to "using AI well", how many mountains do hardware manufacturers still need to cross? When it is no longer rare for products to be equipped with AI large models, can AI still bring selling points and premiums to products? The AI large models are constantly iterating. Can the empowered smart hardware keep up?

Even more, the founder of a smart hardware manufacturer issued a soul-searching question to "Intelligent Emergence": When most players in a track have used AI, does it imply that the threshold for this matter is too low?

At CES 2025, "Intelligent Emergence" interviewed four smart hardware manufacturers that actively embrace AI large models, namely Future Intelligence (AI headphones), Xueersi (AI learning machine), Li Weike (AI glasses), and INAIR (AR glasses). They shared the practices, explorations, and challenges of applying AI large models in their respective细分领域 at the current stage.

Xueersi CTO Tian Mi: It is difficult for Chinese users to pay for AI software. The combination of software and hardware is a better way.

1. Currently in China's smart hardware industry, no manufacturer can truly put the end-side large model on the product line. They are all running in the cloud. Because the current end-side chips in China are not mature, and they cannot run large models.

2. But in the next 2 - 3 years, I predict that some simple large models can run on the end side, and the remaining complex operations will rely on the cloud.

3. It has been less than a year since we implemented the large model on Xueersi's hardware products, and we spent the previous two years exploring. We found that it is very difficult for AI software to be implemented alone in China. Chinese users will not pay for an APP. They think that AI technology is not valuable.

The combination of software and hardware is a very good way that consumers can feel. We have integrated various AI functions into Xueersi's learning machine. The actual user data proves that the most frequently used are various AI applications, such as AI grading, AI problem-solving, and interaction with the intelligent assistant "Xiaosi".

The source of Xueersi's AI learning machine: Authorized by the enterprise

4. At the beginning, we hoped to train our own large model from scratch, but after training for a period of time, we found that more and better open-source base models were emerging. It is actually very uneconomical to do pre-training ourselves. Later, we added a lot of specialized knowledge in the education field for re-training on the best multiple open-source large model bases in the world.

Our approach is to eliminate the pre-training of general knowledge, but none of the other steps are omitted, including the pre-training of professional knowledge, fine-tuning, and reinforcement learning, which are all ongoing.

5. Compared with the previous AI models, the large model has greatly improved the product capabilities of Xueersi, mainly reflected in two points. One is that the accuracy of the work (such as AI grading) has been greatly improved, and the performance is better and stronger; the other is that tasks that could not be done before can now be done.

6. The continuous training and reinforcement learning of large models are highly technical and require very intelligent talents to explore and try in continuous experiments. Talents in this field need to understand both algorithms and be able to do engineering, and their research and development capabilities should be strong.

7. Similarly, for integrating Xueersi's model capabilities, the hardware form is very important for the acceptance of users. For example, a learning machine is more convenient for users to learn than a mobile phone. We now have both an independent APP and a learning machine. Currently, there are many manufacturers in China, including mobile phone manufacturers, Pad manufacturers, PC manufacturers, and glasses manufacturers, that are integrating Xueersi's API.

Future Intelligence CTO Wang Song: The large model is developing in two directions, one is the base, and the other is the end side.

1. In the future, wearable devices will be a so-called AI agent that can accompany users at all times, instead of having to be held in the hand like a mobile phone. It is equipped with multiple sensors and can act as the eyes or ears of the user to perceive the surrounding environment and give feedback to the user.

2. Currently, the focus of Future Intelligence's iterations is to strive in the direction of personalization. We structurally extract useful information from the user's meeting content and store it in the form of a database or RAG to form the long-term memory of the large model. This memory will eventually be associated with the user's personal assistant, who will generate some personalized and preference-matching answers according to the user's personal preferences.

3. AI glasses can now run some computing power. For example, some local models are搭载 on Ray-Ban Meta, which can be calculated in real-time through the SOC chip. However, because the SOC computing power of AI headphones is not enough, the computing power used is still in the cloud. Currently, almost all the so-called smart headphones on the market rely on the computing power in the cloud.

4. If the computing power is deployed on the end side, the large model responds faster, more timely, and is more secure. Many users are concerned about data privacy. For example, some investors' meetings may be very sensitive, and they do not want the data to be uploaded to the cloud. Future Intelligence's AI headphones provide this functional option. Users' data can be stored in the headphones or mobile phones without going to the cloud.

The source of Future Intelligence's AI headphones: Authorized by the enterprise

5. Currently, the AI large model is developing in two directions. One is the base large model, whose parameters and data volume are getting larger and larger; the other direction is the end side, whose efficiency is getting higher and higher, and its security and data security are also guaranteed. These are two directions, and in fact, they do not conflict.

6. The iteration or progress and evolution of AI capabilities actually have a very significant impact on the future of wearable devices. I predict that in five years, some local AI large models should also be able to run on headphones. Once that level is reached, headphones can be used as independent devices, and many interaction scenarios will no longer rely on mobile phones. This will bring a qualitative change to some experiences at the user interaction level.

7. Currently, there are relatively few AI hardware that can achieve a high premium through the integration of large models. This involves an issue of the development stage of an industry. At this stage, in fact, the so-called smart headphones are mostly realized by relying on the software on the mobile phone side. I think that it may be necessary to develop to a certain extent where some end-side models can run locally on the headphones to truly achieve the so-called smart headphones.

To achieve a truly "smart headphones", there are currently two main challenges, both on the hardware side. One is the computing power of the SOC chip. It is very difficult to achieve a computing power chip for headphones that is both small in size and strong in computing power; the other is the battery life problem. Putting the SOC chip into the headphones will result in high power consumption and a short battery life, which is difficult for users to accept.

Li Weike Founder Ru Yi: The application development cost of AI glasses is much lower than that of the XR ecosystem, and it will not follow the old path of XR.

1. I believe that the human eye is still the organ through which we obtain the highest density of information. Therefore, my intuition is that AI glasses are one of the consumer carriers closest to the eyes and the best carrier for carrying the conversational AI of voice interaction.

2. The killer application搭载 on AI glasses will definitely appear in the next two years. This is something that Li Weike must do. Otherwise, AI glasses will become a simple "shell" with little value.

3. When I founded Li Weike in 2021, I had a judgment: In the next three years, AI will have an explosive growth. But I didn't expect it to be so fast. It has already started at the end of 2022, exceeding expectations. So in the spring of 2023, we made a choice - All in AI large model.

When designing the product, the two things we have been firmly doing are, one is to do a good job in AI interaction, and the second is to do a good job in personalization to achieve a thousand people with a thousand faces.

Of course, what we value more is that we build the entire large model system ourselves. It is complete, and we can continuously iterate it. Instead of handing things over to a third-party model company, we have no way to control it ourselves.

The source of Li Weike's AI glasses: Authorized by the enterprise

4. The AI large model provides good information integration on the web, but if it is directly接入 into the glasses, the effect is not good. It needs a fusion process.

For example, if I ask the AI glasses what the weather is like today, the AI large model will not directly answer, but will ask where you are? Therefore, for the AI glasses to have a good experience, the AI large model搭载 on it must be optimized and adjusted.

5. For our AI glasses startup company, doing things related to the large model does not actually require hiring many people. Our entire large model team may only be about ten people, but we can stand on the shoulders of giants to fine-tune and optimize.

6. Not only smart glasses, but any industry at this stage will have fierce competition. A market without competition is not prosperous. Competition is necessary. It can jointly educate the market and penetrate into the consumer end more quickly. In the past few years when there was less competition in the smart market, the cost of educating the market was too high.

7. In the past, the XR ecosystem was not mature, resulting in poor sales. To a large extent, it was because the ecosystem was not perfect and the application development cost was too high. AI glasses will not follow this old path because its development cost is much lower than that of the XR ecosystem. If a suitable scenario can be found, it is possible that one or two developers can create the agent.

INAIR Product Design Head Qi Jingxuan: In the future, the AI Agent itself will independently become an OS.

1. Since the existence of the operating system, everyone would think that there would be a "little assistant" in the computer to help you solve many things. But in the past process, including Siri, Xiao Ai, or Google Assistant, in fact, none of them have done very well, and most of the time they are ineffective. Because users are not clear about where the boundary of the AI conversation ability is. The emergence of the large model has changed this. It makes all problems have a bottom line and allows all conversations to proceed.

2. At the end of 2022, when ChatGPT just broke out, we realized this trend. Incorporating the AI large model into INAIR's products was in our plan from the beginning.

The application of the large model in INAIR is different from that of most AI glasses and AI hardware on the market. Their AI's main function is to help users understand the external world. INAIR is mainly to help users solve software and system operation-level problems more efficiently.

3. For INAIR, using the large model in the product is similar to using Copilot in Microsoft Windows PC, both of which are important selling points. The large model can better solve the user experience problem and provide users with a more natural interaction and a faster and more convenient experience.

4. INAIR cooperates with many AI large models. We found that different large models are good at different things. For example, Doubao may have a relatively strong ability to understand images, and Xunfei has a particularly strong ability in ASR (Speech Recognition) interaction. INAIR will call different large models in different scenarios.

The source of INAIR's AR glasses: Authorized by the enterprise

5. The product advantage of INAIR lies in the integration of software and hardware. In the environment of software and hardware integration, AI multimodal can achieve a closed loop from perception and prediction to interaction, communication, and execution.

This is also the advantage of INAIR products. For example, users can read an English paper while the Chinese translation is presented on the glasses screen in real-time, or directly ask the agent what the Chinese summary of this paper is by voice. Another example is that when watching a movie, users can directly ask the agent questions about the characters or objects in the picture. These are functions that can only be realized by a system-level Siri-like character, which can flexibly call different applications.

All these operations can also be achieved by pure software, but it requires clicking the mouse, copying and pasting, and switching different APP pages, which is much more cumbersome. This is the difference between software and hardware integration and pure software.

6. The device with the integration of software and hardware can also achieve active perception and prediction. For example, if the device sensor detects that the user stays on a certain interface for a long time, the system agent can make targeted service suggestions.

7. We hope that the large model (technology) on the end side can be further improved so that the AI large model can be called on AR glasses without