HomeArticle

Starting from scratch in the trillion-dollar track, is the humanoid robot the hope of the entire AI community?

海豚投研2025-12-26 07:26
Humanoid robots, born with AI

As of 2025, artificial intelligence represented by OpenAI has been around for three years, and the industry is actively investing in AI. Through recent analyses, Dolphin Research believes that in 2026, the key for AI lies in reducing computing power costs and implementing AI investments in both software and hardware. Moreover, the opportunities for new hardware implementation are the real incremental opportunities.

Here, the mass production of Tesla's Optimus is approaching. Humanoid robots are expected to become the main carriers of AI intelligence and profoundly change the way humans interact and produce.

Based on this, Dolphin Research has launched a study on the robot industry chain. This is the first article, mainly from the perspectives of industry and basic research. Starting from the upstream, it analyzes the difficulties and opportunities in the production and cost - reduction of humanoid robot components. The focus is on the following questions:

1. What are the components of the humanoid robot industry chain?

2. What are the industrialization difficulties of these hardware components?

3. What should we look for in the hardware opportunities of humanoid robots?

The following is a detailed analysis

1. Humanoid Robots: Born with AI

Let's start with a basic concept - the two core features of humanoid robots: the "physical form" of a human and the "brain" of a human. Because it is human - shaped, it basically has arms, legs, and a head, and can walk upright, etc. These are the core features to achieve diverse functions.

With the "brain" of a human, the core lies in multimodal perception ability, continuous learning, and decision - making ability. The combination of a human - like form and a human - like brain aims at generality. Simply put, it should not only be able to stand and walk, but also be able to carry boxes, make coffee, lift heavy objects, and even work in a factory to tighten screws.

Moreover, these skills are not pre - set programs. They can be continuously learned in the process of interacting with various external information, and on this basis, independent decisions can be made.

To achieve the general capabilities required by humanoid robots, computing power, algorithms, data, and the coupling of software and hardware are all indispensable. The development of AI algorithms and GPU/ASIC in the past two years has made it possible for rapid iteration of computing power and algorithms, but the hardware constraints are completely different from the past.

In the process of mobile phone intelligence and automobile intelligence, since mobile phones have communication functions and automobiles have transportation functions, they have initial large - scale shipment volumes even without being intelligent. However, without an intelligent brain, a humanoid robot is basically a human - shaped iron block. Its shipment must be supported by an AI brain; without AI, it cannot be shipped effectively.

This results in that as a new category in the AI era, the hardware itself is a greater constraint for humanoid robots:

First of all, as an emerging brand, the hardware requirements of humanoid robots are extremely different from those of other industries. Some hardware needs to be developed from scratch. For example, humanoid robots need to have extremely sensitive "tactile sense", but both the tactile hardware and tactile data in this area are basically blank.

Secondly, the hardware cost must be low enough. Since a large part of the purpose is to replace human labor, according to the estimate of outsiders, there will be 1 billion humanoid robots in 2050, corresponding to a global population of 8 billion and 5 billion globally connected people. Without an affordable price, it is impossible to achieve a penetration rate similar to that of automobiles.

2. Dissection of the Industry Chain Links: Still in the "Growth Period" of Components

According to Musk's expectation, the price of a single robot should be preferably within $20,000, similar to the starting price of intelligent electric vehicles. Similarly, the industry chain of robots is highly complex.

From the perspective of the industry chain, the industry can be roughly divided into the upstream, mid - stream, and downstream.

1) Upstream: Focus on the cooperation model between OEMs and upstream suppliers

The upstream refers to various suppliers of OEMs, including actuators, sensors, encoders, controllers/drives, and integrated modules of the above - mentioned hardware, as well as computing power facilities, algorithms, chips, etc.

It should be emphasized here that the hardware links involved in humanoid robots have a relatively high degree of overlap with the automotive industry, especially the new energy vehicle industry. Similar to the new energy vehicle industry chain, the cooperation models between humanoid robot OEMs (analogous to new energy vehicle OEMs) and suppliers are also diverse.

Taking Tesla as an example, it can either directly purchase components, assemble the modules by itself, and assemble the finished products, or purchase modules or assemblies (such as dexterous hands and some body joints). However, at present, like in the new energy vehicle industry, the main cooperation mode between suppliers and robot OEMs is for suppliers to provide modules and assemblies.

2) Mid - stream: Cross - border entry of automakers + new startups

OEMs can be intuitively understood as companies that manufacture and sell humanoid robots. Currently, the major humanoid robot OEMs are basically concentrated in China and the United States.

Among the enterprises, except for Tesla and XPeng, most are startups. In the early stage, there were not many cross - border players, but those with capital are more powerful.

3) Downstream: Terminal demand, and the market potential depends on the product's capabilities

Due to the current lack of general capabilities, the main applications are currently in specialized scenarios such as scientific research, education, and guided tours. General scenarios such as industry and households have great potential, but the current products do not meet the conditions for commercial implementation. How long it will take for robots to become more general is a question we will discuss later. This article focuses on understanding the upstream progress by dissecting the hardware links of humanoid robots.

3. In - depth Dissection of the Upstream: Where is the Gold Mine?

Musk once boasted that humanoid robots would be a trillion - dollar business. He recently revealed that Tesla's Optimus Gen 3 humanoid robot will showcase a prototype in the first quarter of 2026 and start mass production by the end of 2026 (with an initial annual output of about 50,000 units after mass production), and is equipped with a production line with a final annual output of 1 million units. The fourth - generation Optimus will have a production capacity of 10 million units, and the fifth - generation may have a production capacity of 50 million to 100 million units.

If Tesla can finally overcome the difficulties, the market for robots is obviously quite large. Therefore, Dolphin Research conducts the most core value - chain dissection of the upstream hardware in this article. Here, we take the Optimus humanoid robot as an example.

Structurally, the Optimus humanoid robot can be roughly divided into the head, body joints, and dexterous hands. We list the main parts of Tesla's Optimus Gen2, the positions of the involved components, and our estimated cost per single humanoid robot in the following figure:

First of all, from the perspective of the technical architecture, humanoid robots can be roughly divided into three parts: the perception layer, decision - making layer, and execution layer, similar to intelligent cars, but with higher complexity. Specifically:

1) Perception layer: Mainly includes various sensors and the brain.

The brain mainly refers to artificial intelligence models, which will not be discussed here for the time being. The sensors mainly include visual sensors, tactile sensors, torque sensors, position sensors, etc.

① Visual sensors: Tesla adopts a pure 2D vision solution

What are visual sensors? Simply put, they can be understood as human eyes, mainly used to capture light signals for environmental perception, object recognition, and navigation and positioning.

What solution? There is a debate on the technical route. Tesla uses a pure vision solution, only using 2D cameras; however, most use multi - perception solutions, including 3D cameras (with technical solutions such as structured light, TOF, and binocular vision), lidar, and millimeter - wave radar.

What's the difficulty? The visual sensors used in the field of humanoid robots do not have a fundamental difference in technical routes from those used in consumer electronics and new energy vehicle autopilot. The main difference lies in the higher requirements for dynamics, real - time performance, integration, and low power consumption.

Currently, the main provider of 3D cameras is Orbbec, which has cooperated with many domestic humanoid robot OEMs; lidar is reused from the automotive industry, and the main suppliers are Hesai Technology and RoboSense. Dolphin Research has conducted a separate analysis on Hesai, so it will not be repeated here.

However, in the value composition of Tesla's robots, since only three 2D cameras are needed and the value of a single camera is only 350 yuan, it is difficult to form an effective increment for suppliers when the robot shipment volume is not large.

② Tactile sensors: The core bottleneck, and the technology has not yet converged

What are tactile sensors? Simply understood as human skin, they are mainly used on the hands. Their function is to sense and measure the interaction forces generated when contacting external objects, including pressure, texture, friction, temperature, etc., so they are also called "electronic skin". Currently, they are the main difficulty in the hardware link of humanoid robots.

Tactile sensors are a new field emerging under the new category of robots. Their application in other industries is very limited. They have high requirements for accuracy and sensitivity, and also need to have characteristics such as consistency, flexibility, high reliability, durability, and integration. They are currently the key difficulties to be overcome in the hardware field of humanoid robots.

For example, the accuracy requirement is mainly restricted by factors such as physical principles, miniaturized integration, and dynamic response. Insufficient accuracy may lead to "signal distortion", causing the large - scale model to learn false rules.

The data consistency is mainly restricted by the manufacturing process: in the mass - production process, small fluctuations in material uniformity, process parameters, etc., will lead to significant differences in the output characteristics of sensors in the same batch.

In addition, the performance drift of sensors after long - term use will also have an impact. These factors will cause the large - scale model to either over - fit or have too much noise during training, ultimately leading to generalization failure.

What's the difficulty? From the production perspective, the barriers of tactile sensors lie in material selection (sensitive materials, flexible electrodes, etc.), structural design, manufacturing and packaging processes (photolithography, 3D printing, etc.), signal processing algorithms (it is necessary to decouple multi - dimensional elements from a single physical signal), etc., which require a relatively high comprehensive strength of production enterprises.

What solution? The main technical solutions are piezoresistive and capacitive. The piezoresistive type mainly converts resistance changes into electrical signals. Its structure is relatively simple, but its dynamics and consistency are relatively poor; the capacitive type mainly converts capacitance changes into electrical signals. Its dynamics and consistency are better than the piezoresistive type. Although its technical maturity is relatively low, it is expected to be the main development direction in the future.

Currently, the value - added ratio may not be high, but there is a possibility of increase in the future. According to our calculations, Optimus' hands need to use more than 10 tactile sensors, corresponding to a current value of 3,000 yuan per single humanoid robot. After the industry matures, the cost will be reduced to 1,500 yuan, accounting for only 2% of the total value of the humanoid robot.

However, it should be noted that the technical route of tactile sensors has not yet converged. Considering that the capacitive type may replace the current piezoresistive type in the future, and it also needs to meet the requirements of array and multi - modal perception in the future, there is a possibility of further increase in the value of tactile sensors.

Supplier landscape: The leading enterprises are mainly overseas companies such as Novasentis, Tekscan, JDI, Baumer, and Fraba, distributed in countries such as the United States and Japan.

Chinese enterprises are also accelerating their layout. The ones with relatively fast progress mainly include: Coretronic Sensing, which has invested in R & D enterprises of tactile sensors such as Tashan Technology and Yuansheng Xianda;

Hanwei Technology, which has cooperated with many humanoid robot OEMs and is building production lines;

Fulai New Materials, which has built a pilot production line and is supplying products to many humanoid robot OEMs.

③ Torque sensors: Six - axis force sensors are the key and need domestic substitution

What are torque sensors? They are mainly sensors used to measure force and torque. Intuitively, you can imagine the scenario of unscrewing a bottle cap: when you unscrew the bottle cap, you can feel how much force is needed. Torque sensors are used to sense this. The technical barrier of ordinary one - axis torque sensors is not high, so here we mainly discuss six - axis force sensors.

What are six - axis force sensors? They can measure forces and torques in three directions simultaneously. On Tesla's Optimus Gen 2, there are a total of four six - axis force sensors, which are placed at the wrists and ankles. They are the core sensors for the motion control of humanoid robots.

Figure: Schematic diagram of the elastic body structure of a six - axis force sensor

Source: Dolphin Research

What's the difficulty? The six - axis force sensors on humanoid robots have high requirements for integration, dynamic performance, overload capacity (ability to handle instantaneous impacts), and accuracy. The product barriers are relatively high, and the main difficulties are as follows:

a. Structural design: Maintain high sensitivity under small deformations;

b. Decoupling algorithm: Decouple forces and torques in six dimensions from the original signal and avoid the influence of crosstalk between dimensions;

c. SMT process: Overcome the problem of poor consistency in traditional processes;

d. Calibration process: Determine the accurate corresponding relationship between the sensor output signal and the actual physical quantity through testing and calculation. The calibration dimensions are far more than those of ordinary torque sensors, etc.

The value - added ratio is not high. According to our calculations, Optimus needs to use four six - axis force sensors, corresponding to a current value of 5,400 yuan per single humanoid robot. After the industry matures, the cost will be reduced to 3,200 yuan, accounting for 3% of the humanoid robot.