HomeArticle

GGV Capital in Conversation with Galaxy Universal Robotics: Making Embodied Intelligence Truly Implementable - Investment Notes Issue 244

纪源资本2026-01-26 15:38
With the development of embodied intelligence, how will the form of the entire social labor force system evolve in the future?

Recently, we successfully held the GGV Capital 2025 RMB Fund Annual Meeting, which brought together many investors, well - known enterprise founders, and important partners. They engaged in in - depth discussions and sharing around cutting - edge fields such as AI, intelligent manufacturing, digital healthcare, and embodied intelligence.

The following is a conversation between Xu Bingdong, Managing Partner of GGV Capital, and Wang He, Founder and CTO of Galaxy Universal Robotics:

Xu Bingdong: Let me introduce Dr. Wang He. Dr. Wang is the founder and CTO of Galaxy Universal Robotics. He obtained his doctorate from Stanford University, studying under an academician of three US academies. Meanwhile, he is also the founder and manager of the Embodied Intelligence Laboratory at Peking University. As we all know, Dr. Wang has a high - level professional knowledge, but he is actually very young - he is a post - 90s. We are very glad to have a conversation with him about embodied intelligence today.

Wang He: Thank you, Eric, for your introduction. I'm very honored to participate in today's event. Galaxy Universal Robotics has been established for two and a half years, and currently it is the robotics company with the highest total financing amount among unlisted embodied intelligence enterprises in China - of course, this is inseparable from the support of GGV Capital. I'm also very happy to discuss the topic of embodied intelligence with you today.

Xu Bingdong: Our conversation today will focus more on the technical aspect. First of all, Dr. Wang, what important technical achievements has the company made in the two and a half years since the establishment of Galaxy Universal Robotics?

Wang He: At the beginning of our entrepreneurship, we chose a path of models and hardware that is highly practical and flexible in operation.

Embodied intelligence is intelligence rooted in the physical world. Robots need to interact with the world at the physical level to generate data for continuous model training. From the first day of our entrepreneurship, Galaxy Universal Robotics clearly identified the biggest dilemma of current embodied intelligence: we don't have the data from the rich interactions between robots and the physical world - obviously, there is no such serious data shortage problem for large language models. The text, pictures, and videos from the Internet, especially the massive interactive content on social media, are sufficient to train large language models.

Autonomous driving, although also belonging to embodied intelligence, may be an exception. Millions of car owners will naturally generate data. However, for most embodied intelligence robots, from day one, they will face the problem of data shortage: there are not millions of users of embodied intelligence robots in the industry to generate data.

Today, the reason why Galaxy Universal Robotics can progress so fast is that we are aware of the so - called "data cold start" dilemma that embodied intelligence faces globally. The core technical strategy of Galaxy Universal Robotics is to use physical simulation and synthetic data to give a boost to embodied intelligence during the cold - start period.

We use more than 99% synthetic data and less than 1% data collected from the real world, making embodied intelligence truly implementable.

At last year's World Artificial Intelligence Conference, the Premier personally experienced our intelligent retail robot and instructed us to promote this type of product. Currently, Galaxy Universal's humanoid robots are providing retail services in nearly a hundred stores across dozens of cities in China.

Xu Bingdong: We know that perhaps just half a year ago, many people had difficulty distinguishing between embodied intelligence and robots conceptually. How does Galaxy Universal Robotics define embodied intelligence?

Moreover, does the core technical path of Galaxy Universal Robotics lean more towards the so - called "world model" that is often talked about these days? Does the so - called "world model" really exist in this world? Or do we actually focus more on the self - supervised embodied model driven by pure data?

Wang He: What are the differences among the terms we often mention today: embodied intelligence, robots, and humanoid robots?

The concept of "embodied intelligence" is actually being generalized to some extent. In my understanding, embodied intelligence should be distinguished from traditional robots. In last year's Government Work Report, the Premier defined embodied intelligence as a future industry. This means that the robots delivering meals in hotels according to a planned route or the robotic arms working along a fixed trajectory that we see today are obviously not embodied intelligence - they don't have an intelligent core and are just a set of programs to solve specific problems.

The core of embodied intelligence is to be able to adapt continuously according to tasks and the environment and find ways to handle various new situations by itself. It must be data - driven.

As long as it has an intelligent core, it doesn't necessarily have to be made into a humanoid robot. For example, our quadruped dog at Galaxy Universal Robotics, after being deployed with a large model, can accompany mothers and children to go shopping in the mall. It is also a gift we sent to the education and parenting market on Children's Day in 2025. However, humanoid products are always the most worthy - to - watch category in the field of embodied intelligence and are considered to be the single application that can have the greatest impact on our lives. Because they can easily integrate into the human living environment and do various things that humans can do.

Embodied intelligence is the soul of humanoid robots. So how should we understand and activate this soul? In fact, the underlying mechanism is the same as that of us humans with natural intelligence.

In essence, all of us here are large end - to - end embodied models. If we regard each synapse in our brains as a switch, corresponding to a parameter in a large model, then we are all individuals with quadrillions of parameters, larger than the largest large models today. For any instruction, whether it's doing homework, running, or cleaning, we don't need to switch different "brains" to handle each task. We use a large model to meet all needs and situations.

When we handle a task through action, we consciously mobilize every muscle fiber, telling ourselves how our hands and feet should move to put the expected action into practice. On the other hand, we also have expectations for the effect of our actions - what will happen to the world we face and interact with after we do this? The latter belongs to the category of the world model. And the former, in the face of an instruction, making the body react on how to act, is the Inverse Model.

Both models exist simultaneously and are important. For embodied intelligence, the world model alone is not enough. When we pick up an object and throw it out, our brains naturally can't predict exactly where the object will land and how many times it will bounce on the ground at the centimeter level. But even so, we can usually handle this process freely.

The world model is a means of learning, but not the whole of intelligence. It is not even required by the first - principle in embodied intelligence - the Inverse Model is more in line with the first - principle.

Xu Bingdong: I really agree with your sharing. Then, if we divide embodied intelligence into several stages: motion control, task planning, general behavior, and the real world, which stage do you think the products of Galaxy Universal Robotics are at? And what problems do we need to solve to reach the next stage?

Wang He: For any implementation scenario of embodied intelligence, a closed - loop needs to be achieved in terms of motion control, task planning, general behavior, and real - world autonomy. We haven't achieved general behavior yet. Of course, as we all know, no one in the world can claim to have achieved general behavior.

I can tell you about some scenarios that Galaxy Universal Robotics can currently handle. For example, in crowded areas such as Wangfujing in Beijing, the Bund in Shanghai, the West Lake in Hangzhou, or Chunxi Road in Chengdu, facing a large number of tourists and various drinks, cultural and creative ice creams, etc., our robots can complete the process from picking up goods to delivery. There is visual guidance during this process, and they also need to understand human language. During the movement, they need to turn around freely, squat down or stand up according to the height of the shelves. For the hanging goods, they need to know how to gently pick them off. For bottled goods, they need to know how to reach out and grab them. We are technologically independent in realizing this whole set of detailed requirements.

We are very proud that nearly a hundred of our retail pods have been deployed across China. Not long ago, the Optimus robot developed by Elon Musk's team just managed to pick up a plate and deliver goods in Times Square, New York. The robot basically stood still, and then picked up a sugar bag from the tray and handed it to the audience. Even such a simple behavior was only available for a short - time experience. In contrast, our pods at Galaxy Universal Robotics can be experienced 24/7.

I know that people's expectations for humanoid robots are not just selling goods in retail warehouses. From the current stage to the future, we are also thinking about and planning how to make the leap from single skills to a collection of skills and then to freely implement a wider range of skills. Currently, the skill set of navigation, grasping, and placing forms our first - generation basic model. On this basis, we also hope that humanoid robots can handle various objects on the desktop, on the shelves, and in the deep frames well. Currently, we have deployed dozens of intelligent medicine warehouses across China that are fully operated by our robots, allowing the robots to give full play to their ability to pick medicines quickly and accurately.

Currently, our robots are no longer limited to grasping and placing with two fingers. Not long ago, we released the Dexterous Hand Neuro - Dynamics Model (DexNDM), enabling the general dexterous hand to stably rotate complex objects in any posture and axis for the first time. The robot can use its dexterous five fingers to hold a small screwdriver and screw in the screws bit by bit. With the skill iteration of embodied intelligence at this level, we will open up a multi - trillion - dollar assembly and operation market.

We will expand our capabilities step by step. However, for embodied intelligence, there may not be a "ChatGPT Moment". There won't be a day when before that day, your humanoid robot can't do anything, and after that day, with the breakthrough of a key technology, it can do everything. Personally, I think such a change won't happen suddenly one day. It still needs a long process for data to accumulate continuously, for the model to achieve closed - loop deployment in scenarios, and then for the hardware to be iteratively upgraded. Eventually, it will form a product that can serve an industry and then spread to similar tasks. So I don't think a large number of manual laborers will suddenly lose their jobs in a certain year because the development of embodied intelligence is a gradual and slow process. Of course, from another perspective, perhaps after a patient development of more than a decade, we can really achieve the grand leap from producing hundreds of robots a year to providing 100 million labor forces across the country.

Xu Bingdong: You just mentioned that there may not be a "ChatGPT Moment" in the embodied intelligence industry, and there won't be a sudden qualitative leap one day. We also see that some embodied intelligence companies in the United States seem to be leading the development direction towards the consumer market, trying to make embodied intelligence robots enter ordinary families as soon as possible - this seems to be the future scenario that people have always envisioned. What do you think of this? And how do you view China's advantages in the embodied intelligence track?

Wang He: Some companies in the United States are indeed quite radical in this regard - of course, the path they choose also matches their valuation to some extent. However, for products like robots, anyone who has used them will understand that bringing them into their homes is by no means an easy task. Even for the robots that many people have experienced, which conceptually seem quite simple as long as they complete the mapping of the whole house and can move around to work, in actual experience, even the currently well - recognized and easy - to - use sweeping robots will show many specific problems during operation.

When we bring robots into our homes and put them into our daily life scenarios, of course, we really hope that they can do housework by themselves. It's hard for us to accept that they do a poor job, that they not only don't help but also cause trouble, and that we have to clean up after their mistakes. Therefore, with the current imperfect technology, I can hardly agree that it is a good development path to let robots enter homes immediately.

I think Chinese humanoid robot enterprises must be practical and make their products really do some things first. Then, let this new productive force gradually carry more workload and take over more types of work through data - driven methods.

We chose to start with the retail industry. On the one hand, because this industry has a large demand for labor. There are many scenarios of restocking and inventory management overseas that require a lot of human labor. In China, there is also a great demand for labor deployment around the front - end warehouses within a 15 - minute radius of residential areas. On the other hand, the retail industry has a relatively high tolerance for errors - even if a product accidentally drops, it won't cause too much trouble. When the model reaches 99% accuracy, it can be commercially used in retail. But in an industry like autonomous driving, 99% accuracy is not enough for commercial use because once an accident occurs, it is often a major one.