Wang Xingxing: Insights on Models

Does Unitree not develop the "brain"? Do Unitree robots not work? Wang Xingxing responded to these two misunderstandings.

Graphic by "Intelligent Emergence"

Text by Qiu Xiaofen

Edited by Su Jianxun

When the industry generally believes that Unitree is a company focused on robot bodies, Wang Xingxing, the founder of Unitree Technology, broke this stereotype with some remarks during the World Robot Conference (WRC).

At the WRC, in his keynote speech, Wang Xingxing dedicated a large part to models, algorithms, and data. Many of his viewpoints sparked extensive discussions in the industry -

For example, regarding the currently popular VLA (Vision - Language - Action) route in the robotics field, Wang Xingxing straightforwardly stated that he holds some doubts. He even believes that "it is a relatively simplistic architecture."

The reason is that the existing data volume in the embodied field is insufficient. Wang Xingxing believes that when the VLA model interacts with the real world, the quality and quantity of the underlying data are not quite adequate.

This is already a consensus, but many embodied companies are frantically trying to make up for it by accumulating real - machine data, simulation data, and even building data collection factories.

Wang Xingxing is also outspoken about this - "People pay too much attention to basic data." On the contrary, he believes that the focus should be on the model architecture of embodied robots because the current models are "not good enough and not unified enough."

"Unitree's model team is actually not small"

Previously, Wang Xingxing repeatedly emphasized in public that Unitree's core advantage lies in robot body hardware rather than the "brain." His previous statements easily gave the outside world the impression that "Unitree doesn't work on robot brains."

During the WRC, Wang Xingxing told media such as "Intelligent Emergence" that although Unitree is cautious about investing in models, in fact, "the number of people in the model team is relatively large, but it is small compared to large AI companies."

△ Wang Xingxing being interviewed by the media. Photo by "Intelligent Emergence"

However, he also firmly believes that the number of people deployed on models is not strongly correlated with the final result. At least, from past experiences in the AI field, innovation doesn't necessarily occur in large companies.

"It's not that with more resources, more money, and more people, one can develop the best and earliest global technology. A small - to - medium - sized team also has a chance to develop better models, but the pressure will also be great," Wang Xingxing told media such as "Intelligent Emergence."

In terms of the "brain" route selection, Wang Xingxing is hedging his bets. Another argument of his that sparked industry discussions is about the currently hottest "VLA."

Wang Xingxing doesn't agree with the industry's practice of frantically accumulating a large amount of data for training when the VLA model is not good enough. Because, for a more capable embodied model, perhaps only a small amount of data is needed to achieve a higher success rate in training.

Of course, Unitree doesn't completely avoid using VLA. In his speech, Wang Xingxing also mentioned that Unitree is also trying to add AI for training on the VLA model.

However, in terms of the "brain" route, Unitree clearly leans more towards the video - based route. Last year, Google released a video - driven world model. Wang Xingxing said that as early as last year, Unitree had tried a similar method.

Specifically, it is to first let the video generation model generate a video of "a robot tidying up a room," and then use this video to drive the robot to complete the task of tidying up the room.

△ Screenshot of Wang Xingxing's speech

Wang Xingxing predicts that this video - based route may develop faster and have a higher probability of convergence than the VLA route in the future. However, this video - based route is not 100% perfect. Due to the high requirements for video quality, it will lead to excessive GPU consumption.

But Wang Xingxing also has a certain expectation for how to solve the computing power problem of future robots.

He predicts that in the future robotics field, a low - cost, large - scale, distributed computing power cluster needs to be built. He believes that if there are 100 robots in a factory in the future, it is very likely that a distributed server cluster can be built in the factory because robots require lower communication latency.

Do Unitree robots only perform and not work?

From the robot performances of dancing yangge and playing games like "drop the handkerchief" at this year's Spring Festival Gala to the popular robot fighting at this year's WAIC and WRC, many people think that Unitree's robots only perform and don't work.

Especially when compared with a group of new entrants who are racking their brains to send robots into factories to screw screws, fold clothes, and make beds.

Wang Xingxing straightforwardly said that it is not very realistic to let robots enter factories and households to work at this stage. Currently, performance is a relatively easy - to - implement direction for robots.

On the contrary, within Unitree, the employees who are thinking about how to make robots work are the most numerous.

He also explained why Unitree rarely promotes the scenarios where robots work externally - "Making robots work poses a great challenge to AI models, and our current implementation is not ideal."

Regarding the matter of "working," Wang Xingxing put forward his own view - he hopes that robots should not only do single - function tasks, such as tidying clothes or cooking. Instead, they should be general - purpose and multi - functional, for example, being able to serve tea in a factory and also perform.

Wang Xingxing also made a judgment on the turning point of robots this time: The "ChatGPT moment" for robots may be achieved in 2 - 3 years at the earliest and 3 - 5 years at the latest. He believes that this wave of embodied intelligence trend will not last more than 10 years.

But what does the "ChatGPT moment" for robots look like?

Wang Xingxing envisioned a scenario - in a venue, humanoid robots are walking around freely. When you randomly ask a robot to do something and it can help you complete it, that's when the "critical point" of robots is reached.

Cover source | Photo by the author

Welcome to follow

This article is originally produced by「邱晓芬」， For reprint or content cooperation, please click Reprint Instructions ；Unauthorized reprint will be held accountable.

Wang Xingxing who talks about models

"Unitree's model team is actually not small"

Do Unitree robots only perform and not work?