HomeArticle

Talk about the "snobbery chain" in the "Robot Entrepreneurship" circle | Focus Analysis

邱晓芬2024-11-25 13:19
The dispute between soft and hard approaches has exposed the current fragmented state of the robot industry, but the correction has already begun.

Author丨Qiu Xiaofen

Editor丨Su Jianxun

In 2024, although the entrepreneurial trend of "embodied intelligence" sparked by large models has just begun, after interviewing dozens of robot companies in the "Intelligent Emergence", we discovered a "snobbery chain" in the industry.

"Those who focus on software look down upon those who focus on hardware, and those who work on large models despise those who focus on reinforcement learning.", multiple industry insiders have made similar observations.

Different perceptions determine the entry strategies of robot entrepreneurs.

Wang Sheng, a partner of InnoAngel Fund, told Intelligent Emergence that among the hundreds of domestic robot startup companies that have emerged, there are approximately three types of genes:

The first is the founding team originally from the robot field, who pay more attention to the hardware capabilities of the robot, mainly focusing on control and motors. The core of the ontology school is the robot body (humanoid / quadruped dog), followed by the robot's arm or joint.

The second is the team with a software background, who pay more attention to the intelligence and generalization capabilities of the robot, but "software" can be further subdivided:

One group is entrepreneurs from the previous AI boom who have transitioned to the robot field, such as CV, reinforcement learning; another group is players with a true large model background. There are fewer of these people, and they are at the top of the snobbery chain.

"We look down upon those domestic companies that focus on hardware." The founder of an embodied intelligence brain company told Intelligent Emergence directly. In his view, software is the bottleneck in the development of robots, but currently, hardware companies have a too low budget for AI software, "They just simply connect to foreign open-source large models."

Unitree Technology is a typical "hardware school". Its founder, Wang Xingxing, once responded to this in a public interview - They are very restrained in their investment in AI because it is too costly. "Robots are our foundation," he even said directly, "We welcome customers to use our hardware, even if they delete all our software."

A person from a robot hardware company helplessly stated that currently, there is no consensus on the "software" aspect of robots - there are too many technical paradigms and routes in the industry. What is the boundary between the "brain" and the "cerebellum"? How to achieve embodied intelligence exactly? Currently, the industry is still very chaotic with many questions left unanswered.

Unitree G1

And the result of the "software-hardware dispute" is that most domestic hardware companies are still using the traditional hardware mindset to make robots, and their application of the "brain" is only superficial; while most companies that focus on the "brain" choose to start from scratch to make a hardware themselves.

Each going its own way, the industry presents an implicitly fragmented state.

Large models cannot "empower" robots yet

"Nowadays, hardware companies are like video shooting companies!" Multiple investors and industry insiders told Intelligent Emergence.

This year, the scenes demonstrated by many robot manufacturers in the demos are cool enough: for example, robots can move things in a car factory, help sort goods on the shelves, giving people a feeling that the AGI-era robots have become a reality.

But the actual situation is not so.

Behind a perfect demo is often the case that if a robot is trained to grab a water cup from a drawer in the morning, but the shooting is at night; or there are two more cups in the drawer during training, or the drawer position has been moved - these minor changes may lead to the failure of the robot's task.

"Some demos only succeed once in ten thousand times, and the dishonesty in (the videos) is very serious," an industry insider said.

But you may wonder, since large models are already intelligent enough on various terminals such as mobile phones and computers, why can't they make the robot's brain smarter? So that hardware companies need to work so hard on the "perfect" demo.

figure 02

According to Intelligent Emergence, currently, most hardware companies do not have a deep understanding and application of large models - basically, they only simply connect to general language large models at home and abroad. But in fact, there is still a long way to go between large models and the "spatial intelligence" that robots really need.

Multiple industry insiders told Intelligent Emergence that the larger the data volume of the language large model, the more likely it is to produce "hallucinations" that interfere with task execution. "Language large models have nothing to do with the practical application of robots. The success rate in regional tasks is disastrous!"

And the aforementioned founder of the embodied intelligence brain company said that currently, no team in China has truly started from the perspective of robots to develop a large model that is suitable for embodied intelligence.

Previously, a solution in the robot industry was to introduce an intermediate layer "cerebellum" between the multimodal large model and the robot body - its role is to connect the upper and lower levels, store multiple "sub-tasks" (such as splitting the task of "making coffee for the robot" into multiple sub-tasks such as "taking the cup - grinding the beans - pouring water"), for the brain to allocate, and also to enable the robot body to understand and move.

But new difficulties have emerged. On the one hand, the introduction of the cerebellum means that robot manufacturers need to preset countless sub-tasks in it. If it encounters complex tasks, each sub-task needs to be further split.

On the other hand, the lack of data is an even more difficult problem. Currently, the government and some companies have invested a lot of resources for centralized training, but multiple industry insiders said that the effect is not ideal because everyone does not know what kind of data should be collected or how to define the standard of high-quality data.

Taking the scene of grabbing a cup as an example, the high-quality data in an ideal state should be to use a robotic arm to control it to do it from beginning to end, and tell it - how to grab the cup, and how to grab it when any scene changes. But this also means that for a simple grasping action, tens of millions or even hundreds of millions of data are needed.

When everyone entered the robot field with the mentality of "large models can change everything", they found that the gap is much larger than imagined.

A collective correction

From the perspective of the industry, the fragmented state is not healthy, and now investors and industry players are undergoing a "correction".

In the second half of 2024, the investment trend in the robot industry has secretly changed. Wang Sheng told Intelligent Emergence that before 2024, many investors simply thought that investing in robots = investing in humanoid robot hardware.

In the past year and a half, humanoid robot companies have been booming, and their valuations have soared. According to Intelligent Emergence, taking Unitree Technology and Zhiyuan Robot as examples, the two popular robot hardware manufacturers have completed several rounds of financing in the past two years, and their valuations have both exceeded the $1 billion mark, "It's too expensive for everyone to invest."

Zhiyuan Expedition A2-max

At that time, many domestic companies specializing in the cerebellum / brain of robots were facing financing difficulties, and some even needed to rethink how to tell their stories to the market.

"Before the second half of this year, when we told investors that we were making a unified large model for robots and talking about end-to-end, who would believe it?" The aforementioned founder of the robot brain company told Intelligent Emergence - "But today, if you don't talk about end-to-end, you'll be embarrassed to go out."

The investment trend reversed until the second half of this year.

The founder of a company that makes humanoid robot joints told Intelligent Emergence directly that since this year, it is obvious that it is difficult for pure hardware startup companies to obtain large amounts of financing, "The market is a bit cold."

And from the recent news - overseas, Skild AI and Physical Intelligence have seen their valuations soar to tens of billions in a short period of time; in China, not long ago, the robot cerebellum company "Xinghaitu" received an investment from Ant Group, and the brain companies "Ziliangshu" and "Qianjue Technology" have also received financing one after another.

"Now investors have changed from investing in humanoid robots to investing in embodied intelligence", Wang Sheng told Intelligent Emergence, because everyone now realizes that embodied intelligence is the key to enabling better robot-driven generalization tasks.

The reflection is not only at the investment level, but also a group of hardware manufacturers have begun to collectively review the past model - in the past, robot manufacturers mostly collected data in specialized fields and then made specialized scenarios. Everyone believed that "Generalization is useless."

Now, although large models cannot be truly used yet, the inspiration it gives to this group of entrepreneurs is that perhaps they should first learn to forget the specialized scenarios, first build the general basic model capabilities, and then develop specialized capabilities on top of it - this may be the key to systematically solving the problem of generalization.

The software-hardware dispute in the robot field reveals that under the impact of AI, the robot industry is currently in a chaotic state where consensus has not yet been reached. But the certainty in the uncertainty is that the endgame of robots is the combination of hardware and embodied intelligence, and both software and hardware are indispensable.

"Whether starting from software or hardware, the endgame is similar, it depends on who has higher business efficiency," an industry insider said, "And in the AGI era, robot companies essentially need teams that both understand AI and hardware. But more importantly, everyone needs to identify with each other."

(All images in this article are from official channels)

end

end