The price of humanoid robots has plummeted, making them cheaper than iPhones: an industrial race focused on productivity rather than form.
Recently, humanoid robots have witnessed a historic turning point from being "luxury items" to becoming "affordable commodities." A year ago, engineering prototypes that required queuing and pre - ordering at a cost of nearly one million yuan are now being cleared out in bulk on second - hand websites and disassembly markets at a price of "50,000 yuan per batch." In terms of unit price, some are even cheaper than high - end flagship mobile phones.
The price of Unitree G1 has dropped to a starting price of 85,000 yuan. The consumer - grade entry - level model R1 Air is only sold for 29,900 yuan, and the Bumi of Songyan Power has even dropped to 9,998 yuan, which is cheaper than a high - configuration iPhone. Meanwhile, the localization rate of the supply chain of Chinese humanoid robots has exceeded 90%. According to a report recently released by Morgan Stanley, among the 13,000 to 16,000 humanoid robots shipped globally in 2025, about 90% will come from Chinese manufacturers.
If AI has begun to serve as the productivity in the digital world, then the rapidly developing robots are expected to become the productivity in the physical world. Amid the frenzied situation where product prices are dropping rapidly, a question has emerged: Will only humanoid robots be needed in the future?
The research data from Gartner presents a sobering fact: The "real - world ratio" of humanoid robots is only 1:60. Approximately 98.36% of the surveyed customers are still in the exploration stage, and only 1.64% have actually deployed them. "From a practical or rational perspective, we believe that future robots do not necessarily have to look exactly like humans," said Gao Ting, the vice president of research at Gartner.
Recently, we interviewed Gao Ting. In the conversation, he pointed out that although the human world is designed according to the human body shape, the human body structure is not necessarily the best form for robots. Successful robots should improve the human form rather than simply copy it. For example, the Digit robot tested by Amazon has knees that bend backward, allowing it to squat in front of shelves more efficiently than humans. The Eve robot of 1X uses a wheeled, self - balancing chassis to improve movement efficiency in flat indoor environments. Efficient robot shapes can be canine - like, wheeled, or entirely new forms designed according to task requirements. These forms are usually more stable, less costly, and faster than humanoid shapes.
"Don't be restricted by the 'humanoid' appearance. Instead, prioritize finding specific vertical application scenarios that can quickly deliver value and generate revenue." In addition, Gao Ting sorted out the real picture of the current robot industry from multiple dimensions such as technology, hardware, application scenarios, and real - world challenges.
What can today's robots do?
"In the short term, the focus should not be on 'whether the robot looks like a human,' but on 'whether the robot can stably complete tasks, reduce costs, reduce human - resource dependence, and improve operational efficiency in a very clear scenario.'"
So, what can today's robots do?
"From the existing successful cases, what the robot looks like is not the most important thing. Whether it's the robotic arm in the warehouse or the handling robot in the automotive factory, the scenarios that are relatively easy to implement all have one thing in common: the environment is relatively fixed. What enterprises ultimately care about is whether the robot can do the job well and whether the cost - benefit analysis makes sense, rather than whether it looks like a human."
Gao Ting summarized that the robot scenarios that are more likely to generate investment returns at this stage usually have three characteristics: clear task boundaries, repeatable processes, and relatively limited abnormal situations. For example, industrial in - line logistics, warehousing handling, and some service processes can reduce the complexity that robots need to handle through on - site transformation. The home scenario is different: the task combinations are more scattered, the environment changes more frequently, and any mistake may directly affect personal safety. Therefore, home robots not only require stronger model capabilities but also more mature engineering reliability and safety mechanisms.
Where are the opportunities during the window period of large - scale industrialization?
"Robots are in the window period of moving towards large - scale industrialization," Gao Ting defined the current stage of the robot industry. "Currently, robots have been successfully applied in some industrial and commercial scenarios. Especially in factories and warehouses, robots can already replace some human labor in high - frequency repetitive work. So, capital is very interested in this field, and the valuations of some leading companies have also increased rapidly."
However, large - scale industrial implementation has not yet occurred.
Tesla once set a goal of producing about 5,000 Optimus robots in 2025. But when the fourth - quarter financial report for 2025 was released, Elon Musk also admitted that Optimus only performed some basic tasks in the factory and had not yet become a real productive force. It is reported that Tesla's third - generation humanoid robot is expected to be unveiled in the middle of the year, and mass production will not start until July - August 2026.
"There is a significant gap between the actual implementation of robots, especially humanoid robots, and public expectations. This is the current situation," Gao Ting said.
In Gao Ting's view, it will still be difficult for humanoid robots to achieve full - scale large - scale commercialization in the next 2 - 3 years. "These humanoid robots may continue to appear in relatively fixed environments such as factories, warehouses, and automotive manufacturing to perform some repetitive and low - complexity tasks. However, they will mainly be used in pilot projects and small - scale deployments rather than fully replacing the labor force. In contrast, the commercialization paths of industrial robots, warehousing robots, service robots, and some multi - functional robots for specific tasks are clearer. Because their task boundaries are relatively clear, the input - output ratio is easier to calculate, and safety and processes are relatively easy to control."
For enterprises that want to purchase robots, his advice is: "First, the starting point should not be 'buying a humanoid robot,' but finding high - value, low - complexity tasks. Second, treat the robot as an operational transformation project rather than simply a hardware purchase. Buying the robot body alone cannot ensure successful implementation; on - site processes, space layout, IT/OT integration, and employee collaboration also need to be considered. Third, start with pilot projects and then expand. Begin with one or two small, enclosed scenarios, and then replicate after obtaining real productivity results. Fourth, consider mature product categories first, such as AMRs, robotic arms, and collaborative robots, and then track the maturity of humanoid robots in the long term."
"People in the industry often compare the current development stage of embodied intelligence to the early stage of large models: the technical direction is gradually clear, but the large - scale implementation ability has not yet been formed," Gao Ting explained. "This analogy makes some sense, but the industrialization of robots is more difficult. Because it is not only restricted by data and computing power but also involves sensors, drive systems, power management, and system reliability."
"VLA is still an important route, and the world model is accelerating its integration into the robot system"
What a robot can and cannot do fundamentally depends on its "brain," that is, the robot model.
Gao Ting said, "VLA is a relatively mature technical route for general robot models at present." VLA stands for Vision - Language - Action, referring to vision, language, and action. The "Language" part comes from large language models. It enables the robot to generate corresponding actions by combining environmental information and task instructions.
"The role of the language model is to provide the robot with semantic understanding, common sense, and task - planning capabilities. For example, when the user says 'The room is too dark,' the robot needs to understand the task goal behind this statement and decide whether to turn on the light."
Different from the previous paradigms, the VLA model first has generalization ability and then breaks through reliability in each scenario one by one. The logic is exactly the opposite. "Previously, the approach was to first solve the high - reliability problem in a specific scenario and then try to generalize. For example, first make a robot perform a certain action with very high reliability and then try to make it learn other tasks. However, you will find that it is difficult to truly achieve generalization in this way. You can only get a very specialized robot that performs poorly when the task changes."
Regarding the new technical route of the world model, Gao Ting said, "It provides another idea: to let the system learn the state changes and causal relationships in the physical world and predict the possible results of actions. It does not necessarily rely on language as an intermediate layer and emphasizes the modeling of physical laws. Just like a skilled driver who, when seeing a puddle ahead while driving, does not need to translate in the brain 'There is water here, it may be slippery, I need to slow down,' but the vision directly triggers the physical prediction of the vehicle's trajectory, and the driver instinctively steps on the brake. What the world model aims to solve is the ability to make direct judgments without first translating into language."
However, he believes that "currently, the leading routes for general robots and humanoid robots are still mainly VLA. Although the world model is developing rapidly, it is currently mainly used for synthetic data generation, simulation, evaluation, and auxiliary planning. The cases of directly using it for the control of physical robots are still in the early stage. In the next one or two years, VLA will most likely remain the main body of the robot action model, but the world model will gradually integrate into the VLA system, providing the robot with stronger physical understanding, planning, and pre - rehearsal capabilities. In the long run, it is more likely to see the integration of VLA and the world model rather than the world model simply replacing VLA."
Gao Ting pointed out that currently, VLA is one of the general robot technical routes closest to engineering implementation. It still has a long way to go to achieve human - like flexible and general intelligence, but it has shown good practical value in scenarios with relatively clear boundaries such as warehousing and manufacturing. In the future, VLA is likely to remain the main route for robot industrialization.
The indispensable 'dexterous hand': Multiple engineering trade - offs in robot mass production
If the model is the "brain" of the robot, then the dexterous hand is its most important "tool." "For robots that need to manipulate objects, the end - effector is crucial; in general operation scenarios, the dexterous hand is especially important."
Gao Ting said that not every robotic hand can be called a "dexterous hand." It must have sufficient degrees of freedom, be able to perform fine operations, and adapt to grasping different objects.
In the past few years, dexterous hands have made significant progress: the degrees of freedom are increasing, and the prices are getting lower. However, Gao Ting pointed out that "the difficulty of dexterous hands is not just increasing the degrees of freedom. For industrial applications, it is more important to balance grasping accuracy, force output, durability, and maintenance costs in a limited space. The product with the highest degrees of freedom may not be the most suitable for mass production. Different tasks require different trade - offs between performance and reliability."
He gave an example: "Some high - end overseas dexterous hands can approach the human hand in terms of degrees of freedom and adaptive grasping ability through high - density sensor stacking and complex tendon - driven systems. However, their prices are usually high, often reaching tens of thousands or even hundreds of thousands of yuan, making large - scale deployment difficult. Some entry - level products at a few thousand yuan or open - source products have lowered the usage threshold, but their end - force output, durability, and sensing accuracy still need further verification and are difficult to directly replace human labor at this stage."
The data gap: The gap between simulation and reality, and between machines and humans
Today, the robot industry still faces multiple challenges, and one of the core bottlenecks is the lack of high - quality data. "Data is still the first threshold for robots to move towards large - scale implementation."
Gao Ting said that the data used for training large language models is from the Internet and is relatively easy to obtain. However, the real - world operation data for training robots, such as tele - operation data, requires a significant cost.
Since it is difficult to obtain real data, can simulation data be used as a substitute? This touches on the second challenge: the gap between simulation and reality. Gao Ting pointed out that NVIDIA is focusing on the layout of simulation and synthetic data toolchains. By training, testing, and validating robots in a virtual environment, it can expand the scale of training data and reduce the cost of trial - and - error in the real world. The advantage is low cost and easy scalability. However, there is an important problem: there is always a difference between the simulated scenario and the real world. "No matter how well the simulation is done, there is still a difference from the real world. Even if the robot has completed countless perfect action mappings in the virtual engine, once it faces slight changes in friction, material, or light in the real world, the control strategy learned in the virtual environment may fail. So, simulation data is useful, but it cannot completely replace real data for now."
Some people also propose: Can robots be directly trained with the vast amount of videos on the Internet? The cost is low, and the data is easy to obtain. However, this brings a new challenge: the Embodiment Gap. Simply put, the human body and the robot body are different. It's like "the eyes have learned, but the hands may not have." Therefore, directly migrating human behavior videos or action data to robots will greatly affect the efficiency.
"The more realistic route in the future is not 'relying only on simulation,'" Gao Ting said. "Instead, it is to establish a hybrid data strategy: take real robot interaction data as the core, including tele - operation, manual teaching, and on - site operation feedback; then combine human behavior data such as motion capture and first - person videos, as well as simulation/synthetic data, to improve the generalization ability and reliability of the robot model."
In addition to data, cost is also one of the challenges faced by the robot industry. "For robots to enter all industries, they must be affordable enough." However, Gao Ting also pointed out that China's supply chain is a huge advantage. "The cheapest humanoid robot R1 Air of Unitree Technology is priced below 30,000 yuan. Although it cannot really work in factories and is mainly used for scientific research, the advantage of China's robot supply chain is already very obvious."
The prices are dropping, the supply chain is maturing, the VLA paradigm has been proven feasible, and capital is flowing in. However, the data gap, hardware bottlenecks, and cost problems still stand like three high walls on the way to full - scale popularization.
For this industry, the most rational attitude may not be to chase the gimmick of "humanoid" but to return to a simple question: What practical problems can this machine actually solve for humans? As Gao Ting said, "Don't be obsessed with whether it looks like a human." What is more important is "usefulness." And more important than the price is the value. This industrial race focused on productivity rather than form has just begun.
This article is from the WeChat official account "AI Frontline" (ID: ai - front), written by Hua Wei and published by 36Kr with authorization.