StartseiteArtikel

In - depth discussion on the robot bubble: Unveiling the real logic behind the "hype"

硅谷1012025-12-01 15:16
Pierce through the fog and see the real landscape of the embodied intelligence track clearly.

Recently, the humanoid robot industry has once again been pushed to the forefront of public opinion, still wavering between the debates of "bubble" and "pre - dawn."

In Silicon Valley, startup 1X recently released a demonstration video of Neo. In this beautifully shot advertisement, Neo seems to be able to do household chores as naturally as a human being. The video instantly sparked heated discussions on X and YouTube. However, before the praise faded, a deluge of doubts came pouring in. The outside world pointed out that behind its smooth movements, it actually relied heavily on teleoperation rather than the robot's autonomous intelligence. This inevitably reminds people of those startups in 2023 that claimed to have AI capabilities but actually relied on manual back - end processing. The specter of "artificial" intelligence has reappeared.

Meanwhile, in a latest research report, Goldman Sachs mercilessly pointed out the "real - world temperature difference" in the robot supply chain. Despite the high - spirited sentiment in the capital market and the aggressive production capacity planning of enterprises, generally ranging from 100,000 to 1 million units per year, the actual large - scale orders have not yet materialized. Goldman Sachs even predicts that by 2035, the total global shipments of humanoid robots may only reach 1.38 million units.

Despite facing the doubts of "fraud" and the risk of "over - capacity," the embodied intelligence industry has attracted huge amounts of capital injection in the past two or three years and demonstrated a strong momentum of synchronous evolution with AI technology.

In this issue of "Silicon Valley 101," special researcher Liu Yiming invited two senior investors deeply involved in the Chinese and American markets: Jonathan Qiu, overseas partner of Huaying Capital, and Christine Qing, partner and investment vice - president of Shanda Group, to look through the fog of capital and examine the real landscape of the embodied intelligence industry. Is the current prosperity a preview of a bubble or the pre - dawn of a technological explosion? What differences are there in the strategic layouts and core advantages between Chinese and American enterprises? In the race for commercialization, which scenarios are likely to break through first?

The following is a selection of the content of this dialogue:

01

Is the current robot industry on the verge of a bubble burst?

Yiming: Let's first build a panoramic view. Recently, the video released by 1X has sparked a lot of controversy and is called "artificial" intelligence. At the same time, Goldman Sachs' report also points out the huge gap between production capacity and orders. From an investment perspective, in 2025, is the humanoid robot industry on the verge of an explosion similar to that of ChatGPT, or is it already obviously overheated, or even on the verge of a bubble burst?

Qiu: We've also been having very intense discussions about this topic internally. A core view is that there must be a certain degree of overheating, but we think that there will always be overheating before any major technological explosion. So as investors, we actually still hope to find some relatively clear opportunities in this overheating, rather than completely denying it because of the overheating.

If you ask me about the current specific positioning, I define the current stage as the "BERT period."

You may remember that the Transformer architecture was just introduced in 2017, and then in 2018, Google launched the BERT model. The significance of the BERT period is that we already have a relatively clear technical route and roughly know the direction to go. Mapping to today's robot field, we see models such as VLA (Vision - Language - Action), RT - 2, and Pi0, which actually all have a seemingly clear technical route.

We need to distinguish two concepts. In fact, the so - called "GPT moment" has two stages.

The first stage is the GPT - 3 moment, which occurred in 2020. Its sign is the emergence of an "emergence." Simply put, a large amount of previously accumulated Internet data can finally be used and trained into the model. In fact, if you still remember the BERT era, at that time, when we looked at a bunch of BERT projects, its significance was to define the pre - training (Pre - train) technical route. The "P" (Pre - train) in GPT was actually defined during the BERT period. However, because it was not generative, a lot of data could not be trained in.

It wasn't until GPT - 3 appeared in 2020 that all the data was trained at once, creating a super - large model with 175 billion parameters. I think this (data emergence) is a sign that we are very looking forward to in the robot field.

I've always held the view that everything is driven by training data. Although there are so many ways to collect training data today, such as teleoperation, motion capture, and simulation, in fact, the industry has not yet trained a real model with a certain scale in terms of both parameters and performance.

The second stage is the ChatGPT moment. In fact, when it first emerged (GPT - 3), people couldn't really use it on a large scale. To be honest, because its answers were often not very accurate. It wasn't until ChatGPT came out and did post - training optimization such as RLHF (Reinforcement Learning from Human Feedback) that people suddenly found that the effect was immediately remarkable and it could really be used.

So, back to robots, we are still waiting for the first moment, the GPT - 3 moment. We are very much looking forward to the emergence of an emergence. If we're lucky, referring to the fact that it only took two years from 2018 to 2020, we may be very close to this moment.

Neo demonstration advertisement. Image source: 1X

Christine: I very much agree with Jonathan's view. This definitely breaks out in two stages.

We have indeed done some in - depth thinking about the technical nodes in the first stage. If there is a GPT moment in the first stage in terms of technology, what is its sign? I think it may be that the embodied intelligent robot at this time already has the generalization ability of long - term action chains.

This means that the robot can directly receive human instructions through language and vision, and then decompose it into a series of complex actions to complete it.

For example, if I say, "Go to the kitchen, get a cup, pour water, and then put it back on the table." The entire ability involved here actually includes capabilities from L0, L1, and even some L3 capabilities. This is no longer a simple scripted instruction operation; it is an end - to - end generalization. When this ability appears, we can say that we are very close to, or even have reached, the first step of the GPT explosion.

The second step, I think, is a moment similar to that of ChatBot or even the iPhone. Its most substantial sign is: there is a large - scale explosion in product usage on the C - end.

Will the robot replicate the explosion scale of software? I think it's difficult because it is after all a combination of software and hardware, and it even needs to be implemented in a specific usage scenario. However, I think another model we can compare it to is the iPhone. It started slowly at first, but once it had data and even usage scenarios were established, it became very fast, and its market is very stable and large.

As for the issue of "overheating," I look at it from two dimensions. In terms of the actual capabilities of products or demos (such as the 1X video) and the technological maturity, it really cannot match today's valuations. The current valuations are indeed a bit high. However, if we look forward and consider the future market scale, we are only just beginning to approach the possibility of "physical AI," and the potential of this market is huge. For venture capital, such valuations are definitely digestible and a layout that must be occupied in advance.

02

Chinese and American robot stories: The "brain" in Silicon Valley and the "body" in Shenzhen

Yiming: Both of you often travel between China and the United States and have seen many startups and listed companies. Everyone is talking about the comparison between China and the United States, whether in the AI or robot fields. In your opinion, what are the similarities and differences in the strategic approaches or core advantages between American companies represented by Tesla Optimus, Figure, and Pi, and Chinese companies like Unitree, Zhipu, and Ubtech? Which side is more advanced?

Qiu: We have indeed invested in quite a few domestic embodied projects this year. Since February, we have invested in eight or nine companies. Because I've been in Silicon Valley and have communicated with most of the embodied companies here.

Strictly speaking, I think the two sides are quite similar in many aspects.

Whether in China or the United States, there are indeed quite a few companies that adopt a financing - oriented and marketing - oriented approach. In fact, many videos also contain a lot of CGI or acceleration, or use relatively rough methods, shooting many times and only using one take. This is one type. Of course, there are also those that are constantly publishing papers in an academic way, perhaps often winning with papers and constantly coming up with new architectures and models.

The differences between China and the United States may be more in the segmentation of the technology stack.

The United States is definitely still relatively more "soft," especially in the field of large models. In terms of driving the progress of embodied models from the foundation model, the United States is still leading. Whether it's Pi, Skild AI, or the company of Fei - Fei Li, they all have a strong academic flavor, emphasizing breakthroughs from the underlying models.

In terms of hardware iteration, China has a huge advantage. However, my view is that in the end, there must be integration, and the two sides need to integrate. The progress of many general basic models for robots will definitely also promote the entire technology stack, including the progress of hardware. So there is still a lot of communication between the two sides. China will pay great attention to the latest model progress in the United States, and in fact, the United States also needs to rely on the more mature domestic supply chain in many cases.

Christine: I completely agree with Jonathan's view. The United States definitely has to start with general basic models. In their perception, hardware is just a physical carrier for actions.

But in China, because this year I've been in a learning mindset in China. I've mainly gone back to China to learn and see what level the "hard technology" has developed to.

I just came back from Shenzhen this week after visiting many upstream and downstream enterprises, including those doing hardware, software, and complete machines. The most common thing I heard, which I think is very interesting, is that in Shenzhen, robot hardware products can even be iterated three times a day.

I think this speed is something that Silicon Valley can't even dream of. Silicon Valley neither has the courage nor the ability to do this.

So I think they each have their own strengths. But how to turn these "strengths" into comprehensive capabilities is also something we've been constantly thinking about. Embodied intelligence has both a "body" and "intelligence." When it finally comes to scenarios, how should they be integrated?

In this regard, I think Tesla has learned the best. After all, Elon Musk learned about Chinese production at the Shanghai Gigafactory for so many years, and he must have gained something. He knows how to combine extreme manufacturing efficiency with top - notch software capabilities, so Tesla is indeed the best so far.

Optimus robot. Image source: Tesla / X

Yiming: In fact, at present, because the generalization of software has not been more widely used, some hardware innovations may be able to produce more immediate results at this time. In terms of commercial applications, do you think hardware companies will go further, or do both sides have to wait for each other's progress?

Qiu: In the end, it must be vertical integration.

Of course, commercialization can be divided into several types. The first is short - term commercialization. If you have some hardware on hand, you try to sell it and find short - term customers. This of course also counts as commercialization. But as venture capitalists, we are looking at Long Capital, which is a cross - cycle commercialization that can finally achieve a breakthrough in embodied intelligence technology.

From this perspective, both sides definitely have to work together.

The United States really needs the promotion of the supply chain. In fact, this has been the case for many years. The United States has only recently started talking about bringing the supply chain back and having its own domestic alternative supply chain. But for at least a decade or two, it has been heavily dependent.

In fact, more than a decade ago, the most famous incubator for smart hardware in the United States was called Highway1, and another was called PCH. They had a batch every year or every six months, and all the dozen or so startups had to be taken to China. At that time, there weren't many Chinese entrepreneurs. Most were white or local entrepreneurs, and they were all taken to a building in Huaqiangbei and had to stay there for three months.

Why? Because all their hardware iterations required that they must be able to go downstairs and buy a part they needed, adjust the architecture of their new hardware, and buy a new resistor or capacitor. In the United States, it's very difficult to do this. In fact, even today, many people still place orders on Taobao and then wait for the long - distance logistics to deliver to the United States. Indeed, hardware iteration is quite difficult in the United States, and this will really hinder its commercialization.

In China, although the hardware supply chain is very strong, I've always held the view of "software - defined and software - driven." If you don't have a foundation model or the support of a large model like VLA, you can't achieve full commercialization just by the progress of the supply chain.

So in the end, the two sides are likely to be interconnected, advancing side by side, learning from and integrating with each other.

Image source: Zhipu Robotics

Christine: China is now in the early stage of commercialization, but the conclusion is actually the same. It's hard to say who will enter large - scale commercialization.

I think the biggest advantage of China's supply chain, cost, scenarios, and data is the openness of scenarios and data.

Let me give you an example. A robot company is conducting a pilot on the production line of Mercedes - Benz. How did they do the demo? Because the production line data in foreign countries is very sensitive, they built a small black room on the production line, like a tent, and let the robot perform repetitive actions inside this small black room. This is the production line or production scenario