Qualcomm organized the event, and WANG Xingxing from Unitree said a lot of hard truths.
Wang Xingxing's candid remarks were all revealed at the event organized by Qualcomm.
Currently, the technical routes in the robotics field vary, resulting in a bustling but not very significant progress.
Since the models developed by everyone now cannot be directly deployed for use, it's better to open them up, just like OpenAI open - sourced GPT - 1/2.
The world model open - sourced by Unitree a few days ago cannot be directly implemented in factories.
Currently, both robotics and chip manufacturers are ignoring the importance of chips for robots.
Mobile phone chips and similar chips have great potential when applied to robots.
Hou Jilei's dialogue with Wang Xingxing
All terminals are given new possibilities by AI and Agents. Because it's new enough, embodied intelligence has become the most affected field. But also because of its newness, there must be many disputes and challenges under the bustling surface of embodied intelligence.
Unitree Technology, a long - standing star player in the spotlight, is now directly dissecting many problems in the industry.
Perhaps it's because the event organized by Qualcomm is so rare. The 2025 Snapdragon Summit in China has gathered core players in the domestic and international terminal fields, covering the upstream and downstream industrial chains. The issues discussed openly here may soon become the most concerned hotspots in the industry and thus be solved more quickly.
Not only Wang Xingxing, but also players from hardware, model, operating system and other levels have spoken freely:
Gou Xiaofei, Vice President of Li Auto and Head of Intelligent Space R & D
Li Dahai, CEO of Mianbi Intelligence
Geng Zengqiang, Co - founder and Executive President of Thundersoft
Hou Jilei's dialogue with practitioners
Hou Jilei, the global head of AI R & D at Qualcomm, had a dialogue with them.
To fully present the thoughts and understandings of these experts, we have organized the dialogue content without changing the original meaning. We hope you can gain something from it.
Computing power, heat dissipation, and communication. In the end, robots should pay more attention to chips
The ultimate vision of Agent implementation in terminals may be embodied intelligence.
Wang Xingxing, the founder, CEO, and CTO of Unitree Technology, said that their goal is still to have a general AI on general robots to perform various tasks, whether in factories or for household use.
When a robot can complete tasks based on natural language instructions in an unseen environment, it will be the ChatGPT moment for robots.
He broke this goal down into several stages:
1. Demonstration of fixed actions → Achieved (e.g., dancing, martial arts). 2. Real - time generation of arbitrary actions → Expected to be achieved as early as the end of this year or the beginning of next year. 3. Performing tasks in unfamiliar scenarios → Expected to be achievable around the end of next year (e.g., fetching water, tidying up the table). 4. High success rate and precise operation → It will take several more years. The goal is to approach a 99.9% success rate and be able to complete delicate tasks such as disassembling and assembling mobile phones.
Schematic diagram
If robots are to achieve these, a very crucial issue is the real - time understanding and processing of the physical environment and natural language instructions. This places higher requirements on the communication capabilities of edge - side AI.
Wang Xingxing said that communication is very important.
Currently, I think many robotics manufacturers and chip manufacturers somewhat ignore the importance of chips for robots.
Just like new energy vehicles, the biggest change in the past decade or so is that with the emergence of new communication protocols, the number of cables has decreased significantly. In the early days, the number of cables in a gasoline - powered car was extremely large. Maybe the weight of cables in a car could reach 100 kilograms.
The same is true in the robotics field. A communication cable has 4 or 5 wires. Sometimes, a lot of time and effort are spent on reducing the number of wires. For a robot, as its performance improves and it becomes more reliable, reducing the number of cables is very important. So far, the most common failure of industrial robots is cable problems, which may account for 60 - 70%.
For a robot, the biggest challenge in reducing the number of cables is to improve the overall communication protocol and enhance communication quality.
I believe that the ultimate vision for future robots is to have only one cable on each arm, which is very neat. There is still a lot of work to be done to achieve this goal, but it is very worthwhile.
In addition, regarding the underlying chips, Wang Xingxing mentioned the difficulty of deploying large - scale computing power on terminals.
The internal space of a robot is limited. Many times, high - computing - power chips cannot be installed. At the same time, battery capacity and heat dissipation are difficult problems to solve for such large robots.
Schematic diagram
He believes that in the future, the peak power consumption of the computing power deployed on embodied intelligence should be preferably controlled within 100W, and the average normal power consumption may only be 20 - 30W, which is equivalent to the power consumption of several mobile phones.
Too high power consumption won't work. I think there is great potential in applying mobile phone chips and similar chips to robots.
Currently, we are in the pre - dawn stage. The pre - dawn stage is quite troublesome. The biggest problem is that the technical routes of various companies in the industry vary greatly, and everyone has their own ideas. This makes the field very bustling, but the overall progress is not very fast.
At present, if we really want to develop a general AI model for embodied intelligence, at this stage, we can still maintain a more open attitude. Anyway, the models developed by everyone cannot be deployed for use, so it's better to be more open.
Some time ago, Unitree open - sourced a world model based on video generation. Not only the weight parameters, but also the model itself, the dataset, the training source code, and the deployment source code were all open - sourced.
Unitree's open - sourced model
Wang Xingxing said that this model cannot be directly used in factories or daily life, so it's better to open it up. This is a bit like OpenAI in the early days. Since the commercial value of large models or their distance from implementation was still far, GPT - 1 and GPT - 2 were open - sourced.
We also hope that more open - source initiatives can promote the common progress of this field.
As for the issues between the VLA model and the world model that everyone often discusses, it's actually very difficult to explain clearly. Because even the VLA model and the world model themselves have many variations. Our company will maintain an open attitude and try various models, including self - development and cooperation with third - parties.
Personally, I think we should remain humble in the AI field. There are always smarter and more open - minded people creating better things. We should maintain a humble attitude to learn.
Sometimes, I also think we should try to forget many things from the past and not let the past limit our thinking.
Our goal is to make robots truly useful in households and factories. I think adjustments may be needed in chips, communication protocols, computing power, communication architectures, and even the entire wireless communication architecture.
Security issues are also a concern. As more and more robots are sold, some hackers specialize in cracking our robots, which really gives us a headache.
Before the robotics field matures, we can learn from many other fields, such as mobile phones and new energy vehicles, to build a more standardized system, collect data, and train models.
This field is really new at present. We face new challenges and problems every moment, which cannot be solved by a single company. We also hope that more people will participate in solving these problems. For example, the Linux system we commonly use still has many vulnerabilities. It takes a lot of time to completely solve the underlying vulnerabilities during development. If a third - party company can solve these problems, we are very willing to cooperate. This is a very valuable thing.
The edge - side model will be the core orchestrator in the Agent system
Agent is fundamentally an application form of large models. Currently, the form of Agent is more cloud - based. However, as the implementation trend progresses, edge - cloud collaboration will be inevitable.
Li Dahai, CEO of Mianbi Intelligence believes that edge - cloud collaboration is now an industry consensus, which can provide a better user experience. Compared with the edge side, the cloud side can provide almost unlimited computing power and resources and is responsible for solving complex problems. The edge side is closer to users, requiring a very fast response speed and ensuring user privacy.
The edge side has a very important advantage, which is "always on". It can continuously perceive the world, achieve context understanding based on device privacy, and collaborate with different Agents in the cloud to organize and orchestrate complex tasks.
Specifically in actual terminals, for example, in a car cockpit, there should be a relatively strong edge - side model that can understand user needs and communicate with the cloud - side model.
Take a simple example. If the edge - side model in the cockpit detects that a child in the back is crying, it can first activate a powerful language interaction model in the cloud, asking if the child wants to have a chat or telling a story to distract the child. However, this activation process must be judged by the edge side, rather than having a cloud - side model constantly observing what's going on in the cockpit, which would expose a great deal of privacy.
Schematic diagram
I believe that the edge - side model of terminals will be the core orchestrator in the entire Agent system in the future.
So, what are the requirements for edge - side models in the future AI industry?
Li Dahai believes that the knowledge density of edge - side models should always be improved.
Since edge - side models are deployed on various hardware devices, enter thousands of households, and interact with different user scenarios, they need to have good self - learning abilities, especially self - iteration and personalized development based on exploratory content. Therefore, it is very important to improve the knowledge density of edge - side models. Mianbi Intelligence has proposed the view of doubling the knowledge density every three months. In contrast, cloud - side models will pay more attention to the improvement of intelligence levels. The difference between the two is quite large.
While the model is self - developing, it should also have deeper collaboration with chips, terminals, and systems.
In this regard, Li Dahai mentioned that in - depth cooperation is very important.
From the perspective of an edge - side model company, we have very in - depth cooperation with chip manufacturers like Qualcomm. Only through such in - depth cooperation can the knowledge density of edge - side models be truly brought into play, and the same work can be done with lower power consumption.
In addition, on the application side, he believes that the current MCP intelligent agent collaboration method is definitely not enough. There is a need for more secure collaboration methods based on user authentication. These are the infrastructures built in the mobile Internet era and need to be rebuilt in the AI era.
The core of Agent is the ability to provide services
Terminal hardware is the physical carrier of Agent. Thanks to Agent, hardware terminals such as mobile phones, PCs, and cars have regained new vitality.
For cars, they are already in the process of intelligent upgrading. The arrival of Agent makes this upgrading more comprehensive and in - depth.
Gou Xiaofei, Vice President of Li Auto and Head of Intelligent Space R & D believes that it is a basic industry consensus that cars can achieve autonomous driving. After achieving autonomous driving, the services that can be provided in the car space will become a differentiating competitive advantage for car manufacturers.
Schematic diagram
AI has created a huge opportunity. It has the potential to integrate seemingly fragmented ecosystems. In fact, many terminals are currently ecological islands. For example, PC interaction is based on mouse, keyboard, and graphical interfaces, while mobile phone interaction is based on touch. AI's more natural, dialogue - based interaction will be a unified interaction mode across terminals.
Now, everyone is talking about Agent. What is Agent?
Today, when people talk about Windows, they assume there are a lot of services behind it. When talking about Android, they also assume there are a lot of services behind it. Similarly, in the future, when people choose which Agent to use, they will actually look at how many services it can bring.
This year, Li Auto is also focusing on this. Li Auto's Agent, Li Auto Buddy, has accessed a large number of services related to car travel. This year, we have also started to access life scenarios and services outside the car space, such as asking Li Auto Buddy to order a cup of coffee, pay utility bills, or call a chauffeur. It will cover more and a wider range of services. We believe that in the future, the core reason for users to choose an Agent will be