Dialogue with Xiong Youjun: Building an "OpenAI" for Embodied Intelligence to Narrow the Gap in Robot Technology between China and the US | Emerging 36 People
The article is first published on the public account "Intelligent Emergence".
Written by Tian Zhe
Edited by Su Jianxun
The American TV series "Westworld" depicts a social scenario where humans and robots coexist: In a future with highly advanced technology, robots and humans have almost no differences in appearance and behavioral performance, and humans can interact with robots at will in an amusement park.
In Xiong Youjun's view, in the future, robots will not be confined to the amusement park but will become a part of human life. Robots can be human life assistants, friends, or even a part of the body. It is with this dream that Xiong Youjun has been immersed in the robot industry for more than 20 years.
Xiong Youjun is one of the co-founders of the robot company Ubtech. He has led the development of several humanoid robots, and one of the robots named "Walker" has performed on the CCTV Spring Festival Gala stage twice. Earlier, he was also responsible for major artificial intelligence innovation and development projects of multiple departments, including the National Development and Reform Commission.
He divides the development stages of robots into three periods: weak artificial intelligence, strong artificial intelligence, and super artificial intelligence. Although robot technology has developed for decades, it is still in the stage of weak artificial intelligence. He believes that only by achieving general artificial intelligence and enabling robots to interact with the real world can robots enter the next development stage.
In 2023, Xiong Youjun left Shenzhen, where he had lived for more than ten years, and went to Beijing to become the CEO of the National and Local Jointly Built Embodied Intelligent Robot Innovation Center (hereinafter referred to as the "Innovation Center").
This is a significant change in his life. "We are a national-level innovation platform, not only needing to achieve the breakthrough of key common technologies but also undertaking the responsibility of driving industrial development," he introduced.
In 2023, a large number of robot start-up companies emerged one after another, exploring different technical routes. However, this also means that the robot industry needs to go through a long period of technical exploration and verification cycle to achieve mass production of robots. At the same time, in the United States, not only is there the industry pioneer Boston Dynamics, but also new robot companies like Figure and Digit have emerged. Compared with China, American robot companies have more sufficient capital advantages and top talents.
In August this year, Xiong Youjun asked Mark Raibert, the founder of Boston Dynamics, a question: Why is Boston Dynamics' technology powerful but not in a hurry to commercialize? Mark Raibert's answer is very simple, just one sentence: "I don’t care."
This is beyond Xiong Youjun's expectation. "He simply doesn't care about these and only focuses a lot of energy and resources on the leading and innovative nature of the technology."
In Xiong Youjun's view, the overall technological gap between Chinese and American robots is not large, but Chinese robot start-up companies can get fewer resources, and there is also a certain problem of reinventing the wheel. This is the reason why he decided to join the Innovation Center to promote the accelerated maturity of domestic robot technology and industry standards.
After one year of research and development, the Innovation Center has created three robots through the general robot platform "Tian Gong", achieving technical breakthroughs in robot humanoid walking, humanoid running, and embodied operation. The multi-functional embodied intelligent agent "Kai Wu" focuses on creating a universal robot "brain" and "cerebellum", making the same technical solution applicable to different forms of robots. In the future, the technologies developed by the Innovation Center will be open-sourced one after another.
In Xiong Youjun's view, simply open-sourcing the technology is not enough to significantly promote the progress of the industry. The Innovation Center is also responsible for formulating robot industry standards and norms. It is reported that the Innovation Center has participated in formulating 4 national standards and 3 international standards.
With the maturity of humanoid robot technology, Xiong Youjun predicts that in the future, almost everyone will be able to afford robots. But he is also worried that a price war will repeat in the robot industry. "This is a kind of harm to the industry, and I don't want to see it."
"If that day really comes, what roles can the Innovation Center play?" "Intelligent Emergence" threw this question to Xiong Youjun.
"It means that robot technology has become very mature, and there is no need to solve common technical problems. At that time, the mission of the Innovation Center will have been completed, and we will also start a new journey," Xiong Youjun said.
The following is a conversation between "Intelligent Emergence" and Xiong Youjun, the CEO of the National and Local Jointly Built Embodied Intelligent Robot Innovation Center. The content has been slightly edited:
Building an OpenAI for the Open Source of the Robot Industry
"Intelligent Emergence": Let's first talk about the responsibilities of the Innovation Center. What do you think is the difference from other commercial companies?
Xiong Youjun: The National and Local Jointly Built Embodied Intelligent Robot Innovation Center is a new type of research and development institution defined by the country. It represents the country and the entire robot industry, and is committed to conquering common and key core technologies. The ultimate goal is to promote the Chinese embodied intelligent robot industry to occupy the commanding heights of global scientific and technological competition.
"Intelligent Emergence": Can the Innovation Center be compared to the early OpenAI?
Xiong Youjun: Yes. I think the Innovation Center is a guide in the robot industry, gathering industry resources and then promoting applications.
"Intelligent Emergence": What is the Innovation Center mainly doing at present? How can it promote the development of the robot industry?
Xiong Youjun: The Innovation Center is mainly promoting the development of embodied intelligent robot technology and ecological construction. It focuses on two important tasks: one is the "Tian Gong" general robot mother platform, and the other is the "Kai Wu" multi-functional embodied intelligent agent platform.
The Innovation Center is also responsible for leading national-level tasks, tackling key technologies, write international and national robot-related standards, and cooperate with scene parties to promote the application of pilot robots.
"Intelligent Emergence": How is the team size currently?
Xiong Youjun: Our team currently has nearly 200 people, with an average age of 32, and the R & D proportion is about 70%.
"Intelligent Emergence": The work content of the Innovation Center and your previous work at Ubtech is very different. How do you switch roles?
Xiong Youjun: I am currently still the part-time CTO of Ubtech, and I can continue to pay attention to and promote the R & D work of Ubtech. At the same time, Ubtech is one of the main shareholders of the Innovation Center. When it was just established, it provided great support to the Innovation Center. Ubtech not only dispatched R & D personnel to assist in starting multiple projects but also opened more than 300 patents to the Innovation Center.
I believe that Ubtech has made important contributions to promoting the development of the Innovation Center, especially in understanding the pain points of the robot industry.
"Intelligent Emergence": What pain points does the robot industry need to solve in order to promote the large-scale mass production of robots and achieve commercial application?
Xiong Youjun: I think the main aspects are divided into several aspects:
First, the maturity of technology still needs time. This not only includes the technology of the robot itself but also covers the technologies in multiple fields such as artificial intelligence and embodied perception, as well as the improvement of motion control technology.
Secondly, the maturity of the supply chain also restricts the large-scale application of robots. Currently, the robot output is low, resulting in the inability to reduce costs.
The immaturity of technology and the supply chain has led to the fact that humanoid robots are not currently mass-produced, but in fact, there is market demand. With technological iteration and the increase in robot output, these problems will be gradually solved. This is a gradual process.
"Intelligent Emergence": If new technologies emerge in the industry, will the Innovation Center immediately invest in tracking and R & D? What measures will the Innovation Center take if it wants to serve the industry?
Xiong Youjun: In the face of new technologies, we will first conduct an evaluation, mainly considering the maturity of the technology, the application prospects, and whether it is in line with our long-term strategy.
If the new technology is in line with the direction, we will choose to invest in R & D or cooperate with industry partners to promote it. In addition, the Tian Gong open source project and the embodied intelligent data set construction of the Innovation Center provide technical incubation and support for the industry to help promote the technological progress of the industry.
"Intelligent Emergence": Now there are no standards for robot technology and components. Does the Innovation Center have standards for the technical routes of robots?
Xiong Youjun: Humanoid robot technology is for different industries, and the solutions will be different, so there will not be a unified standard. For industrial robots, the most cost-effective solution will definitely be found in the future; for commercial and home service robots, there will also be corresponding solutions. Currently, the technical route of humanoid robots is still in the exploration stage, and all directions are not mature enough. In the future, a certain technical solution may develop faster and the industrial chain supporting it may be more complete.
Therefore, in the face of a rapidly developing industry, it cannot be determined too rigidly now, and it needs to be dynamically adjusted according to the development of the industry.
"Intelligent Emergence": Does the responsibility of the Innovation Center also include driving the formulation of robot-related policies and standards?
Xiong Youjun: That's right. This is a very important direction for us to solve common key technologies. The industry development must first be standardized, and this standard must be done. Therefore, since the establishment of the Innovation Center, it has led three international robot standards, four national standards, and also released some industry standards.
The Gap Between Robot Intelligence and the Body is Not Large
"Intelligent Emergence": What is the current R & D focus of the Innovation Center?
Xiong Youjun: In addition to "Tian Gong", we have also allocated more resources to the construction of the "brain" of the robot, and this "brain" is the "Kai Wu" platform. It is an embodied intelligent agent that enables the robot to achieve "one brain for multiple machines" and "one brain for multiple functions".
"One brain for multiple machines" means that the "Kai Wu" platform can not only serve our Tian Gong series robots but also other robot companies, including humanoid robots, quadruped robots, and industrial robots, making the robots more intelligent. "One brain for multiple functions" means that the platform can adapt to different scenarios, such as industry, commercial services, and households.
Centered around this "brain", we have also built an embodied intelligent data set platform to jointly build diverse application scenarios with multiple partners. In addition to the industrial scene, we are also developing in special, household, and commercial service fields, and in the future, we will build the largest, densest, and most universal embodied intelligent data collection platform in the world.
"Intelligent Emergence": Robots come in various forms, and even the number of fingers of different robots is different. How does the "Kai Wu" platform achieve serving multiple types of robots?
Xiong Youjun: For different types of robots such as five-finger, four-finger, and two-finger robots, one mainstream robot of each type will be selected to collect data for different actions. Secondly, we have a set of embodied intelligent algorithm deployment to make Kai Wu have a universal ability to adapt to various robots.
"Intelligent Emergence": What technical route does the robot currently developed by the Innovation Center choose?
Xiong Youjun: We are now inclined to a pure vision and bionic route.
"Intelligent Emergence": What is the reason?
Xiong Youjun: Because the bionic cost is reliable and the product is also controllable.
"Intelligent Emergence": It is learned that the Innovation Center has now launched three self-developed "Tian Gong" robots. What advantages have been achieved in some areas, and what are the future plans?
Xiong Youjun: Tian Gong is actually a complete system. When the Innovation Center was established, I set five key tasks: the humanoid robot body, the motion control algorithm, the embodied intelligent large model, the robot operating system, and the robot tool chain.
At present, some phased achievements have been made in parts such as the robot's legs and arms. Next, we will open source a complete set of motion control algorithm libraries, including model preset control, new motion control algorithms, as well as reinforcement learning and imitation learning networks. In addition, the "Kai Wu" platform will also be open sourced one after another in the future to promote the technological progress and resource sharing of the entire industry.
"Intelligent Emergence": Is the Innovation Center's primary task to solve the motion control of the robot or the problem of the robot body?
Xiong Youjun: Yes, the body and motion control are now the important problems to be tackled in the first stage, and we are also simultaneously promoting the "brain" development of the robot.
"Intelligent Emergence": It is noted that Kai Wu can enable the robot to have the ability to disassemble and execute complex long-range tasks. How is this achieved?
Xiong Youjun: The ability to execute long-range tasks is the key to the intelligence of robots. The more steps in a long-range task, the more complex the task is. We are working to enable "Kai Wu" to complete complex tasks with more than 50 steps, and at the same time, it can flexibly respond to various tasks in different scenarios.
The core of "Kai Wu" is the design of the "embodied brain + cerebellum": the brain is driven by an AI model, responsible for task planning, logical reasoning, and scene understanding; the cerebellum is responsible for specific actions, such as performing skills, handling errors, and real-time feedback. The two work together to complete the task through the intelligent agent framework.
In addition, the Innovation Center is also building a national-level embodied intelligent data platform to collect, label, and optimize various data. This not only makes "Kai Wu" learn faster but also enables it to perform better in more scenarios.
"Intelligent Emergence": With the ability to perform 50-step long-range tasks, theoretically, in which scenarios can robots be applied?
Xiong Youjun: In the future, these capabilities will enable robots to be widely used in manufacturing, service, and household scenarios. In factories, robots can undertake high-complexity, long-process, and fine tasks; in the service industry, robots that can perform long-range complex tasks will meet diverse needs, not just limited to simple conversations.
"Intelligent Emergence": Can the current Tian Gong robot understand and execute tasks such as "Give me a bottle of cola"?
Xiong Youjun: At present, the Innovation Center has achieved the basic long-range task execution ability, such as handling scene tasks such as preparing breakfast. Through technological progress, data accumulation, and the optimization of the embodied intelligent large model, robots will be more powerful in the future and be able to complete more types of complex tasks.
For tasks such as "Give me a bottle of cola", it can be achieved through the "embodied brain + cerebellum" architecture: the AI large model (embodied brain) is responsible for task planning and making action decisions; the data-driven end-to-end skill module (embodied cerebellum) is responsible for performing specific actions, such as opening the refrigerator, taking out the cola, and handing it to the user.
"Intelligent Emergence": Now the robot brain can perform complex tasks, but the body has just learned to run. Does this mean that the robot's motion control and limbs cannot keep up with the development of the brain?
Xiong Youjun: This does not mean that the body and motion control are lagging behind. The integration of the large model and robot technology has just begun in the past two years. To make the robot smart enough to have the ability to autonomously understand and execute tasks to complete more complex tasks, the entire embodied intelligent industry still has many topics to overcome.
"Intelligent Emergence": What problems (原文中此句