StartseiteArtikel

Der unabhängige Variable-Roboter Wang Qian: Das Large Language Model für Embodied AI kann nicht einfach von ausländischen Modellen kopiert werden.

王方玉2025-05-29 09:00
Wang Qian gründete früher einen Quant-Fonds in den Vereinigten Staaten. Doch "konnte er oft nachts nicht schlafen, weil er an die Idee dachte, Roboter zu bauen". Im Jahr 2023 auflöste er den Fonds und kehrte nach China zurück, um ein eigenes Unternehmen zu gründen.

Text | Wang Fangyu

Editor | Su Jianxun

Wang Qian has the appearance of an intellectual and speaks in a calm tone. But once the topic turns to embodied intelligence, he shows a “murderous” side:

“If you just follow others, you will naturally fall behind in technology. It's really unpromising.”

“Starting a business requires some determination. If you have already found an escape route from the beginning, your mindset is wrong.”

Robots are what Wang Qian is most obsessed with. He graduated from Tsinghua University with a bachelor's and master's degree and pursued his doctorate at the University of Southern California in the United States. He once founded a quantitative fund company in the United States. But after doing quantitative work, he “couldn't sleep well for a long time and regretted not continuing his robot business.”

△Source of the picture: Authorized by the enterprise

In 2023, Wang Qian dissolved the fund and returned to China. He founded “Independent Variable Robotics” in Shenzhen.

In less than a year and a half since its establishment, Independent Variable Robotics has completed 7 rounds of financing, with a cumulative financing amount of over 1 billion yuan. On May 12th, Intelligence Emergence exclusively reported that it had received a new round of financing of hundreds of millions of yuan from Meituan alone.

2023 was the year when the domestic embodied intelligence track began to flourish. Huang Renxun, the founder of NVIDIA, first predicted that embodied intelligence would be the next technological wave. Galaxy Universal and Zhipu Robotics were both established in this year.

Compared with these two companies, Independent Variable Robotics did not gain much attention in the early stage. But with continuous new financing, it is gradually moving towards the center of the embodied intelligence stage.

An investor from a dual - currency institution told Intelligence Emergence that judging from the financing amount, domestic humanoid robot startups have clearly formed different echelons. There are three companies in the first echelon: Unitree Robotics, Zhipu Robotics, and Galaxy Universal, with financing amounts all over 1.5 billion yuan. Independent Variable Robotics has a financing amount of over 1 billion yuan and has entered the quasi - first - tier from the second - tier enterprises.

Just like the previous AI large - model wave, there are two completely opposite attitudes in the domestic embodied intelligence field: optimism and pessimism. On the one hand, Zhu Xiaohu is pessimistic - “Now every humanoid robot can do somersaults, but where is the commercialization?” On the other hand, investment institutions continue to pour in large amounts of money, and startups are accelerating the mass - production process and giving optimistic growth expectations.

Wang Qian is a typical representative of the technology - believing faction.

Since its establishment in 2023, Independent Variable Robotics has firmly chosen the technical route of the “end - to - end unified VLA large model” and promotes R & D at a speed of updating the model every 2 - 3 months.

One year later, with the release of the model of the American company Physical Intelligence (PI), VLA became the mainstream route in the industry.

While most other manufacturers' models are still performing simple Pick & Place operations (i.e., Pick for grasping and Place for placing), the WALL - A model developed by Independent Variable Robotics can already enable robots to complete a number of complex and delicate operations such as clothing handling, storage organization, and wire harness arrangement.

△Independent Variable's robot independently makes shaved ice at the GAIE2025 exhibition. Source of the picture: Authorized by the enterprise

The pessimistic view in the market is that “general embodied intelligence is still too early and the commercialization is not clear.” But in Wang Qian's view, the development process of the industry is much faster.

He expects that an embodied intelligence large model at the level of GPT - 3 is expected to appear in about a year. The real commercialization cycle of embodied intelligence will also gradually unfold in the next one to two years.

Currently, the commercialization scenarios of embodied robots mainly come from two markets: scientific research and education, and reception performances. But in Wang Qian's view, the scale of these two markets is relatively small in general, and they have limited significance for the long - term development of the industry and cannot be regarded as the final target markets. Regarding humanoid robots entering factories to do simple and repetitive work, he even said bluntly that “it's actually just a PR (public relations) act.”

Wang Qian believes that to achieve truly valuable commercialization, it is necessary to rely on the improvement of the generalization ability of the embodied intelligence model.

Currently, Independent Variable is not in a hurry to promote commercialization. Instead, it focuses on improving the model's capabilities. Two - thirds of the company's expenditures are invested in the model and its related businesses.

“Without being modest, Independent Variable is in the leading position in the domestic embodied intelligence model. Investors naturally give some preferential treatment to the first - place company. Everyone believes that we can achieve a very high upside and hopes that we can be more focused on the grand goal of the general embodied intelligence model.” Wang Qian said confidently.

The following is a dialogue between “Intelligence Emergence” and Wang Qian, the founder of Independent Variable Robotics. The content has been slightly edited:

“The integrated end - to - end model has a higher development ceiling.”

Intelligence Emergence: In the past six months, what important new progress has the company made in terms of model capabilities?

Wang Qian: Our progress is quite fast. On average, we update the model every 2 - 3 months.

Previously, Independent Variable's model was a model that purely output actions. It was multi - modal input and single - modal output. Starting from October and November last year, we began to develop an any - to - any model, which is multi - modal input and multi - modal output. In addition to outputting actions, it can also output language and vision.

Under the framework of full - modal fusion, Independent Variable also conducts long - term COT (Chain of Thought). Almost between these two rounds of financing, we developed the Chain of Thought.

In March this year, Google Gemini Robotics announced their progress, which is also a similar approach: any - to - any and COT. Recently, the newly released π0.5 of Physical Intelligence (PI) has also done something similar. So in fact, we predicted the direction of technological progress very early, and the time when we did this was similar to that of foreign players like PI.

So we dare to say that our model level is basically on the same level as that of PI and Google. Because we really achieved similar things at a similar time and achieved similar effects. While domestic manufacturers have generally just started to move in this direction, there is still a big gap in progress.

Intelligence Emergence: Has the unified end - to - end VLA large model (Vision - Language - Action Model) become the mainstream technical route?

Wang Qian: Yes, to a large extent, this is affected by the release of the new model of PI in October last year. People can see that end - to - end is a good direction and a major trend.

Now, basically, whether people believe it or not, at least they will wave this flag. But in fact, there are still big differences in how well they do it or whether they really implement end - to - end. At the same time, you will find that there is a lot of so - called “definition science” in the market, and people are “re - inventing” what end - to - end means.

To add, there are also two different ways of the end - to - end route. One is like Figure's two - layer model path: a high - level VLM for reasoning and planning, and a low - level VLA for the actual action generation part; the other way is not to make a distinction, which is an integrated end - to - end approach.

We also tried the two - layer model in the early stage, but found that the ceiling of the single - layer model is significantly higher than that of the two - layer model. So Independent Variable prefers the unified end - to - end paradigm.

△Source of the picture: Authorized by the enterprise 

Intelligence Emergence: What are the technical routes parallel to the end - to - end route?

Wang Qian: There are several parallel routes, but actually people don't do them much now. Mainly, they use three - dimensional vision or other methods for perception and then add some traditional control to do some Pick & Place operations (referring to grasping and placing).

The above methods may be suitable for some scenarios, such as very simple Pick & Place tasks, including the previous - generation industrial automation scenarios. But obviously, this is not what we are pursuing. Figure and Boston Dynamics used to adopt this method, but now they have turned to the end - to - end approach.

Intelligence Emergence: If compared with AI large models, at which stage is the current embodied intelligence model of Independent Variable?

Wang Qian: I think it is still at the GPT - 2 stage. GPT - 3 had some obvious features that our current model does not have enough scale to achieve. The progress of players like PI and Google in the industry is also similar, which is determined by the objective law of the Scaling Law.

Intelligence Emergence: How long will it take for domestic embodied intelligence large models to achieve commercialization?

Wang Qian: Actually, it will take about a year at the fastest and about two years at the slowest. I'm talking about real commercialization, which means users are actually willing to pay. Of course, commercialization has different stages. To enter the C - end market, such as household nanny robots or indoor service robots, it will take longer, maybe 3 - 5 years.

People generally overestimate short - term technological progress and underestimate medium - and long - term technological progress - it will be faster than people think.

Intelligence Emergence: When it comes to embodied model training, people always say that data shortage is a bottleneck. Do you have enough data?

Wang Qian: Data is a problem with a timeline. For example, when you have no perception or understanding of the embodied model at the beginning, collecting a large amount of data may not be the right solution. Most of the data collected may be useless or of low quality. So the amount of data you collect should match your understanding of embodied intelligence.

Improving the data collection scale is just one aspect. How to improve the data quality and deeply understand what kind of data is needed is another aspect. Independent Variable has done a lot of work on the latter, which is a more efficient way.

Currently, the data quality of some open - source data sets and third - party data is generally substandard. If you actually use such data for model training, the model effect will not be particularly good. These data can be used as a supplement, but you cannot rely on them completely. Currently, our data is mainly collected by ourselves.

Intelligence Emergence: In this wave of the embodied intelligence boom, domestic startups are generally quite cautious in spending money, as if they are preparing for a cooling - off period. What do you think?

Wang Qian: First of all, Independent Variable is quite cautious in spending money. We will never spend money on things that are not necessary. We are doing a long - term and important thing and need to prepare for possible fluctuations in the industry.

But on the other hand, we still need to spend money when it is necessary. Without spending money, we really can't achieve anything. If we always wait for foreign open - source results to follow or copy, it's really unpromising, and we can never achieve the ultimate goal of general robots.

The issues of confidence and preparing for a difficult period actually reflect a lack of ability, which leads to a lack of confidence. If you really have enough ability and judgment, you won't think about this problem in this way. The initial team genes and ability levels will determine many strategic judgments and ways of looking at problems.

After all, why does the trough in the industry come? It's because the industry has not achieved any real results. Once the results are achieved, there will naturally be a peak. Why not be the company that leads the peak and the investment boom instead of passively adapting to the environment? I think this is the mindset that an entrepreneur should have.

“The value and significance of some commercialization scenarios are questionable.”

Intelligence Emergence: How do investors evaluate the technical capabilities of Independent Variable? Do they rely on DEMO videos or on - site demonstrations with real machines?

Wang Qian: We always do on - site demonstrations with real machines. Since the first day of its establishment, Independent Variable has insisted that on - site demonstrations with real machines are the most important. There are too many ways to fake videos. Only on - site can you see the real performance of the model. You even need to interact with the robot on - site and conduct some artificial interference to see how the model performs under various extreme conditions. This can really reflect the level of the model.

Intelligence Emergence: At the current valuation level, do investors now have commercialization requirements for Independent Variable?

Wang Qian: It depends on the investors. Some investors focus more on how high the upper limit of the embodied intelligence model's capabilities can reach, while others focus more on commercialization. The preferences of different investors vary greatly.

Independent Variable is a bit special. Without being modest, we are in the leading position in the domestic embodied intelligence model. Investors naturally give some preferential treatment to the first - place company. Everyone believes that we can achieve a very high upside, so they won't require us to commercialize just for the sake of it. They hope that we can achieve “valuable” commercialization and be more focused on the grand goal of the general embodied intelligence model.

Intelligence Emergence: You haven't officially released the physical product yet. How do you meet the commercialization requirements of the other part of the investors?

Wang Qian: Actually, we already have a physical product, but we haven't officially launched it on a large scale. And our physical product has already been actually sold and put into use, mainly in service - related scenarios. In addition to the current model, we will also launch new physical products.

△Source of the picture: Authorized by the enterprise 

Intelligence Emergence: Is the technology for embodied intelligence to enter the service industry mature now?

Wang Qian: We are still in the POC (Proof of Concept) stage with our seed customers. There is still a good chance from the end of this year to the beginning of next year. Of course, we still need to do a lot of engineering work at present. And we won't be limited to simple Pick & Place operations (i.e., Pick for grasping and Place for placing).

Too simple Pick & Place operations are not helpful for the further training and development of the embodied intelligence model. The previous