Interview with Gao Yang of Qianxun Intelligence: It's not very "reliable" for scientists to start a business, but starting a business is like a game.
Intelligent Emergence Mapping
In the entrepreneurship of embodied intelligence, aim to be Apple, not Android.
Text | Qiu Xiaofen
Editor | Su Jianxun
Whether it's the just - concluded WAIC (World Artificial Intelligence Conference) or the upcoming WRC (World Robot Conference) this week, how can we identify the true strength of a robot at the exhibition?
Gao Yang, the co - founder of the embodied intelligence company "Qianxun Intelligence", provided the following tips:
For robots claiming to be able to fold clothes, you can try to crumple the clothes into a ball and randomly throw them on the table, and observe whether it can continue to complete the action; or give it pants and coats to see if it has the generalization ability across categories.
When the robot is operating, you can observe whether its movements are smooth enough instead of jerky, which represents the coordination between thinking and movement...
Gao Yang, who gave us the guidance, is one of the popular entrepreneurs in the current field of embodied intelligence. After graduating with a doctorate from the University of California, Berkeley, he chose to return to China and became an assistant professor at the Institute for Interdisciplinary Information Sciences of Tsinghua University.
In 2023, he founded the embodied intelligence company Qianxun Intelligence together with Han Fengtao, the former CTO of Luoshi Robotics. Han Fengtao has rich hardware experience and has previously managed the mass production and shipment of tens of thousands of robots, while Gao Yang has a research foundation in AI. The combination of academia and industry has made Qianxun Intelligence a popular company in this wave of embodied intelligence.
In the 19 months since its establishment, they have raised a total of more than 1 billion RMB. The list of investors includes Huawei Hubble, JD.com, CATL, Shunwei Capital, etc.
Stepping from the "ivory tower" of universities into the business world, Gao Yang also has to face the prejudice against "scientist entrepreneurship" due to stereotypes, but he doesn't avoid it.
"Scientist entrepreneurship is not very reliable to some extent." In his opinion, scientists pursue truth and their work is driven by interest, while entrepreneurship focuses on business success. "I'm constantly admitting my limitations. I know what I'm not good at and try to make up for it."
Gao Yang compares entrepreneurship to "a kind of game", and the communication with investors and customers is the process of leveling up and defeating monsters in the game. He has met hundreds of investors. At the beginning, because his technical explanations were too obscure, "he put people to sleep", but Gao Yang can quickly adjust after getting feedback. "Now it's become more proficient to deal with investors. This is the growth process I like."
In the office of this young entrepreneur - there is still a small Kapibara doll pasted on his computer monitor. Gao Yang communicated with "Intelligent Emergence" about his journey from a scientist to an entrepreneur and some views on the technical path of embodied intelligence. The following is the transcript of the communication (slightly edited):
Be the Apple, not the Android, in the field of embodied intelligence
Intelligent Emergence: In the field of robotics, you and Mr. Han make a good pair: one is a scientist in the software direction, and the other is an entrepreneur with rich hardware experience. What were your criteria for choosing a partner at that time?
Gao Yang: I thought about it for a long time about how to sell embodied intelligence to customers. To this day, a relatively obvious conclusion I've reached is that we have to do both hardware and software integration. We have to be the Apple in the field of embodied intelligence, not the Android.
Because in the early stage of technology, the cross - ontology ability is definitely relatively weak. It's the same in the early stage of countless industries. For example, in the beginning of personal computers, companies like IBM did both hardware and software. It took about thirty or forty years for people to gradually divide the work between hardware and software.
I've done a lot of software work myself, but I have little experience in hardware. So I think it's particularly important for a company in its first 30 years to be strong in both hardware and software.
On the other hand, actually many people in the hardware field don't embrace change, or they don't realize the change. But Mr. Han realized this change very early, and we thought alike.
Intelligent Emergence: What did you see in 2023 that made you have the idea of starting a robotics business?
Gao Yang: It was mainly the transformation of the learning paradigm brought by ChatGPT. Before ChatGPT came out, I myself didn't believe in what OpenAI was constantly working on. Even many senior professors at Berkeley thought it was nonsense. But when they developed GPT - 3.5, we reflected and realized that we were wrong. If you follow this logic, embodied intelligence is an inevitable phenomenon, it just takes some time.
Intelligent Emergence: You decided in 2023 that robots must be a combination of hardware and software, but now there are still leading robotics companies that still ignore the "brain". What do you think of this?
Gao Yang: Leading companies have their own logic. Their logic is that they are very good at making hardware and can live well by selling to educational customers and even go public through this. Their best solution is to first stabilize the educational market and not let others take it away, because there are many other companies trying to do this business now. After going public, they can gradually do other things. It's difficult for a company to do many things at the same time, especially when there is fierce competition in the educational market.
Intelligent Emergence: If I make a non - humanoid hardware, a new form of ontology, is there room for growth for companies that only focus on the ontology?
Gao Yang: The design of the ontology is strongly related to the needs of AI. For example, if I make an ontology and when I stretch my arm, the inverse kinematics solution fails, and I can't reach the thing on the table. This kind of problem is very common. If you don't develop the hardware and AI together, you won't be able to realize this problem.
Intelligent Emergence: From the perspective of this industry, can't the market accommodate a second such company?
Gao Yang: I think it's difficult to accommodate.
From scientist to manager is a "game"
Intelligent Emergence: When Teacher Wu Yi asked you to come back from Berkeley, you planned to start a business. I remember you once mentioned that you thought it would be more challenging to do scientific research when you came back?
Gao Yang: At that time, I just wanted to come back to China to do research. There was no such opportunity for technological change at that time. Another option for me at that time was to be a research engineer in a big company in the United States. But that path was planned by others, and you just need to do a little thing well.
But if you become a professor, it's like starting a laboratory from scratch. There is nothing at the beginning, and there are no people. You have to build everything from the ground up. It's a challenge from 0 to 1. So I started my business around the second half of 2023, which was about the third year after I returned to China.
Intelligent Emergence: I feel that you don't just consider things from a scientific research perspective, but also from a business perspective.
Gao Yang: I'm very interested in how to make technology accessible to everyone. So I started to think about how to make robots well from a business perspective, then deduced that we need to integrate hardware and software, and then chose who would join me in starting the business.
Intelligent Emergence: Why do you think management is a technology? Because technology is more rigid and rational, but management also has some emotional components.
Gao Yang: Management is not a strict technology. It may be in an intermediate state between technology and art. But management has rules to follow, yet it's not like in science and engineering where you just follow a set of rules and everything will be okay. It still requires some flexibility.
Intelligent Emergence: You mentioned before that scientist entrepreneurship is not very reliable. Then when you practice it yourself, how do you make up for these additional abilities?
Gao Yang: Let me first explain why it's not reliable. Scientists pursue truth, and their work is driven by interest. But in entrepreneurship, the most important goal is to create a product. Many times, it's not about truth, but about how to serve customers well. Different customers may have many different demand indicators and dimensions.
In this process, you need to use the form of a company to achieve this goal, and there are also many professional techniques involved, such as how to build a team and cultivate the company as a growing entity.
I can't say that I'll definitely succeed 100%. I can only say that I'm constantly admitting my limitations. I know what I'm not good at and then try to make up for it.
Intelligent Emergence: Specifically for you, how did you complete the transformation from a scientist to an entrepreneur?
Gao Yang: I think it's about admitting my limitations, opening up to learn about entrepreneurship, and using the success of a business company to drive everything, rather than just exploring the truth.
Intelligent Emergence: Do you enjoy this process?
Gao Yang: I think I quite enjoy it. It's a very interesting game and there are also many lessons. One of the lessons is that at the beginning, I talked to investors in a fact - based way. I was very precise, but people were sleepy and bored.
Then I realized that I couldn't talk like that. I needed to use a more vivid way to explain things to them. There are many such lessons.
Intelligent Emergence: Do you also enjoy this process?
Gao Yang: In the real world, this is what I need to complete. As long as I want to do this thing well, I have to go through it.
Intelligent Emergence: How many investors have you met? Have you counted?
Gao Yang: I haven't counted, but it may be one or two hundred. And you have to talk to each of them.
Intelligent Emergence: In this process, how do you continuously correct the way you interact with investors?
Gao Yang: I think feedback is very important. Otherwise, you don't know what you're not good at. Now it's become more proficient to deal with investors. This is the growth process I like.
Intelligent Emergence: Do you think this will be a relatively big challenge for you?
Gao Yang: I think it's okay. It may be like any other technology. It's just a special technology.
The secret to judging the quality of VLA: experience it yourself
Intelligent Emergence: At this stage, using Transformer for pre - training has become a consensus, but I wonder if there will be obvious differences in the effects in the later stage of engineering among different companies?
Gao Yang: I think you can go to the WRC site to have a look. Maybe there are thousands of theories, but people still have to experience it themselves. For example, can you interact with it? Crumple up some clothes and throw them to the robot to see if it can refold them.
Intelligent Emergence: This can be a guide for us when visiting robot exhibitions.
Gao Yang: Since a robot is a very large - scale system, it's difficult to figure out which one is better. I think the best way is to experience it yourself and see what each company's model can actually do.
Intelligent Emergence: Everyone is talking about VLA this year. How to judge the quality of each company's VLA?
Gao Yang: One is the algorithm. For example, some VLAs can't decompose tasks. The VLA of Qianxun has a fast - slow system, which can make the movements very smooth. Robots without a fast - slow system will have stiff and jerky movements.
On the other hand, it's about data. Large models need a lot of data for training. The model we developed ourselves uses human video data from the Internet for pre - training. While some VLAs can't do pre - training on human videos, so their performance is relatively poor.
From a technical perspective, it's these two points. The characteristics of the algorithm, the data used for training, and the cleaning, processing, and proportioning of the data will all affect the effect.
In terms of visual perception, it's about how complex tasks the robot can perform. For example, some models can only perform relatively simple tasks, which we call pick and place. But the model of Qianxun can perform complex tasks like folding clothes. You can even cause some disruptions, and it can still complete the task very well.
Intelligent Emergence: Is the VLA model of Qianxun's Spirit v1 derived from your two previous studies (ViLa and CoPa)?
Gao Yang: It's not only from those two studies, but from many studies. Even one - two VLA has been engineered in Qianxun's model.
Intelligent Emergence: What's the difference between your one - two VLA and ordinary VLA?
Gao Yang: If you tell it to do something a little more complex, like putting a phone in a drawer, it may require three steps: pick up the phone, open the drawer and put it in, and then close the drawer. Ordinary VLA can't do this, but one - two VLA can decide by itself when to decompose the task into smaller tasks and then complete them. But if you tell it a very simple task, it won't continue to decompose it.
Intelligent Emergence: You previously made a judgment that in four years, we will reach the stage of Robot GPT3.5. What are the characteristics of this stage?
Robot GPT3.5: At this stage, basically, when you tell it anything, it can complete about 70% - 80% of it. For example, it can enter the home and fetch a bottle of water from outside. But it may not work 100% of the time, maybe only 70%.
Intelligent Emergence: The industry has also made a lot of reflections on the VLA route. What do you think are the parts that can be revised?
Gao Yang: I agree with what Chen Jianyu (founder of Xingdong Jiyuan) said before. There is indeed too much of the "L" part in VLA because this model actually doesn't need to understand such complex language. There is still a lot of room for improvement in the specific technology of VLA.
Intelligent Emergence: Then how to improve it specifically?
Gao Yang: In practice, there are many aspects. At the data level, for example, how to better use human video data from the Internet. Because currently, robots widely use Internet graphic and text data, but Qianxun Intelligence has been using human video data from the Internet because human videos are intuitively related to the tasks that robots perform.
Secondly, how to use teleoperation data for continuous and effective supervised fine - tuning of VLA, and how to let VLA perform reinforcement learning in the physical world? Because supervised fine - tuning is about humans collecting data for it, and reinforcement learning is done by the robot itself.
Secondly, at the architecture level, as Teacher Chen mentioned, how to reduce the "L" part, and how to design a better action tokenizer. These are also areas that can be continuously explored and improved.
Intelligent Emergence: Is the fast - slow system also our original technical feature? When was it completed?
Gao Yang: Yes, it was about 4 months ago.
Intelligent Emergence: After the fast - slow system was developed, what significant improvements will there be in terms of actions, for example?
Gao Yang: You can see that some robots are jerky when doing things because their models don't have a fast - slow system.