Before the implementation of embodied intelligent robots, there are still these challenges to overcome | Focus analysis
Author | Huang Nan
Editor | Yuan Silai
In 2025, in a robot competition, setting tasks like arranging tablecloths and closing food storage boxes seems rather trivial.
However, these actions that humans can easily accomplish remain difficult for robots to this day.
In late May, at the ICRA WBCD 2025 event in Atlanta, USA, a Xinghaitu dual - arm robot was slowly moving tennis balls into packing cartons. The semi - humanoid ARX robot repeatedly opened and closed the latches of food storage boxes, and the Songling robot was performing operations such as installing silicone columns and handling transparent test tubes on a simulated scientific experiment table.
These demonstrations are far from the "embodied intelligence" humans dream of and can be described as rather rudimentary.
More than a month ago, at the Beijing Yizhuang Humanoid Robot Marathon, the scenes of multiple participating robots falling one after another dramatically exposed the technological shortcomings beneath the surface of intelligence.
This scenario has somewhat punctured the market's long - standing irrational expectations. After the frenzy of capital influx, some embodied intelligence companies lack the ability to implement projects and practical application support, yet their valuations have skyrocketed in a short period. Clearly, a crisis looms under this imbalance.
However, it should also be noted that embodied intelligence manufacturers are accelerating the convergence of their respective paths.
Currently, the industry has reached a general consensus on the applications of embodied intelligence robots. Industrial manufacturing, logistics and warehousing, biomedicine, and commercial services are regarded as the core scenarios.
In these scenarios, robots are relatively common in the industrial manufacturing field. Taking the US market as an example, as one of the transportation and logistics hubs, Atlanta has many warehousing and logistics companies setting up warehouses here. These companies need to hire a large number of local workers to complete operations, but the labor cost is high. According to data from websites such as Indeed and Glassdoor, the hourly wage of warehouse workers ranges from $20 to $32. Assuming an average hourly wage of $25 and 2000 working hours a year, the annual salary reaches $50,000.
Therefore, many logistics warehouses have achieved a high degree of automation. A representative example is the "goods - to - person" warehousing network built by Amazon Kiva.
Robots are still far from replacing humans, but they can already take on some of the burdens.
"Even a breakthrough in a single minor aspect, with the amplification effect of large - scale operations of robotic equipment, can significantly improve the accuracy and flexibility in unstructured environments, thereby greatly enhancing the efficiency of the entire system. These breakthroughs in aspects that meet practical needs and have commercial value are driving the leap of robots from 'command execution' to 'intelligent autonomy,'" Xu Zhuo, the initiator of WBCD (What Bimanual Can Do, a challenge to explore the ability boundaries of dual - arm robots) and a member of the DeepMind robot team, told Yingke.
Problems in the Three Core Scenarios
Compared with other fields, industrial scenarios such as factory workshops and logistics warehouses are highly structured and have a more stable working environment. In addition, a large amount of real - world data has been accumulated from equipment such as heavy - duty robotic arms, collaborative robots, logistics unmanned vehicles, and AGVs in the past, which can be directly reused in the training of embodied robots.
However, in the logistics packaging process, automation technology has not yet effectively covered and solved all the problems.
Today, many warehousing centers around the world still rely heavily on manual operations. Workers need to flexibly adjust the packing methods and complete precise packaging according to the shape, size, and characteristics of the products. This also reflects the fundamental challenges of existing robot technologies in flexible operation and environmental adaptation.
Yu Lei, the co - founder of Xinghaitu, told Yingke that the packaging process, which seemingly should be highly automated, still requires human labor. Firstly, the products come in diverse forms, with significant differences in size, weight, and shape among different categories, requiring dynamic adjustment of packing strategies. Secondly, the packaging process is complex, involving ensuring proper placement of products and firm closure of boxes, which requires fine - tuned actions such as multi - finger coordination and force - control sensing. Thirdly, single - arm operation has limitations and is difficult to complete tasks that require "holding the box with one hand and packaging with the other." Traditional dual - arm solutions are also restricted by algorithms and lack flexibility.
Scene at ICRA WBCD 2025 (Source/WBCD)
It is even more difficult to implement automation technology in the biomedicine field.
For example, in the experimental scenarios of the pharmaceutical industry, repetitive basic tasks such as test - tube handling and pipette use consume a lot of human resources and face challenges in terms of operational consistency and precision, giving rise to a considerable - scale experimental outsourcing industry.
Chen Zhigang, the founder of Hetan Intelligence and the former CDO of WuXi AppTec, pointed out that the "purification" process alone, which involves separating the target product from the synthetic reaction mixture, consumes a large amount of human labor. "Although there are automatic column chromatography machines on the market to complete some steps, full - process automation still has detailed difficulties. Actions such as alignment, connection, and pressure stabilization require high - precision coordinated control."
Take WuXi AppTec, a leading pharmaceutical company with a market value of over 200 billion RMB, as an example. It undertakes some research tasks from European and American laboratories and has established teams in regions with relatively low labor costs in China and other Asian countries to handle repetitive and low - tech experimental processes.
However, even with relatively cheap labor, companies still hope to introduce robots to improve efficiency. "Due to the complexity of many biological experiment processes and the limited degrees of freedom of existing robot end - effectors, their intelligence and dexterous operation capabilities are restricted. Therefore, it is difficult for them to imitate the high flexibility and dexterity of human hands when handling biological samples," Sun Lingfeng, a researcher at the Robotics and AI Institute (RAI), told Yingke.
The Xinghaitu robotic arm used in the WBCD logistics packaging scenario (Source/WBCD)
Zeng Yiheng, the North American head of Songling Robot, also told Yingke that it is even more difficult to implement in the biomedicine scenario. On the one hand, medical devices are generally expensive, and the optimal operation effect needs to be achieved within a limited budget. Secondly, it is also difficult to perform fine - tuned actions such as test - tube recognition and force control.
"There can be various ways to complete tasks, but in specific implementation, ensuring data quality, system stability, and the quality of experiment completion are challenges that must be faced," Zeng Yiheng said.
The Songling robot in the WBCD life science experiment challenge (Source/WBCD)
The third type of scenario, which is also the ultimate goal of all embodied robot manufacturers, is to target consumer - end users and let robots truly integrate into thousands of households.
Currently, some tasks are ready for initial implementation. Take home kitchens and some commercial catering establishments as examples. The operation processes such as food ingredient processing and tableware cleaning are highly standardized and patterned, which provides an ideal application scenario for robots to replace humans.
However, only a few companies are testing the waters in this scenario. Simple daily operations such as laying tablecloths and packaging food in storage boxes are actually complex, with long operation processes. Moreover, there is a lack of relevant training data sets and demos, making it impossible to provide an effective reference path. Zhang Xinliang, the CEO of ARX (Ark Infinity) Robot, believes that tasks involving flexible objects and dual - arm coordination that are sensitive to force feedback will be the direction that the industry needs to work on together in the next one or two years.
The ARX robot used in the WBCD table arrangement scenario (Source/WBCD)
The problems faced by embodied intelligence, although originating from different industries and scenarios, essentially boil down to one point: the complexity of the real world far exceeds the capabilities of current software and hardware. Whether it is the machinery itself or the automation system, it will take a long time to break through the bottlenecks.
Multiple Solutions Await Validation
Based on the demonstrations of dozens of companies and research institutions at the WBCD event, the embodied intelligence solutions can be generally divided into three categories.
The first category is teleoperation, including remote teleoperation and a new type of teleoperation system combined with actuator hardware.
Both control the robot in real - time through interactive devices or communication links. The advantage lies in more decisive and dexterous operation, which can solve the problem of the robot's insufficient autonomous ability in complex and unstructured environments. For example, Aiou Intelligence, an embodied intelligence solution provider, deploys its control terminal at its Shenzhen headquarters, and technicians remotely operate the robots at the US venue. The solution of FrodoBot company enables remote operation from New York to Atlanta.
Yingke has learned that this "remote operation" model is not only suitable for data collection scenarios but also can optimize costs through cross - regional human resource scheduling. A straightforward example is that by having operators from Southeast Asian regions remotely manage robots in European and American warehouses, the labor cost can be reduced to one - third to one - tenth of the local level.
The team from Carnegie Mellon University at the ICRA WBCD 2025 (Source/WBCD)
The second category is the dexterous hand solution, which is currently quite popular in the market. It simulates the human hand through mechanical structures and integrates perception and motion control systems to ensure that the robot can complete delicate operation tasks.
The team from the Swiss Federal Institute of Technology in Zurich demonstrated its ORCA robotic hand at the WBCD. It is a low - cost, open - source, and highly anthropomorphic robot hand with 17 degrees of freedom and a 60 - degree bending ability at the wrist. It can directly use various human tools, complete tasks such as rotating objects and stacking building blocks, and also supports reinforcement learning and imitation learning.
The third category is the automatic model, which trains the robot by "feeding" it a large amount of operation data, enabling the robot to make autonomous decisions and finally complete tasks independently without real - time human intervention.
However, the development of the current model still faces key bottlenecks. Its generalization ability and adaptability to the dynamic changes in the real world are significantly insufficient, which essentially stems from the "triple dilemma" of training data: real - world data is of high quality but expensive to collect; open - source Internet data is large in scale but severely contaminated by noise; and synthetic simulation data has strong controllability but a gap in practical implementation.
This directly restricts the actual performance of the model in an open environment and has become a key area that urgently needs to be broken through. For example, teams such as Carnegie Mellon University, Kuawei Intelligence, and the Georgia Institute of Technology, which all use the VLA (Vision - Language - Action) framework, show obvious differences in technical paths such as complex instruction parsing ability and in - depth understanding of dynamic scenarios.
The TSC Consulting team from India (Source/WBCD)
In the short term, "human - robot collaboration" remains the mainstream paradigm, which means that robots first need to approach human operation levels in specific scenarios and single - point tasks and establish advantages in various dimensions. In the long run, with the maturity and breakthrough of technologies such as data flywheels, simulation training, and reinforcement learning, autonomous intelligence is the ultimate goal.
In this process, the transitional form of human - robot coexistence is not only a reasonable existence but also an inevitable stage in the industrial iteration.
The business world is far more real and complex than the laboratory. For enterprises in this wave, the primary task no longer depends solely on technological advancement but on how to accurately grasp the delicate balance between "technological maturity" and "market demand." In particular, accelerating the commercial validation in vertical scenarios is the key to survival in the fierce market competition.