In Silicon Valley, Chinese and American embodied companies discussed solutions to four problems.
Text by | Zhou Xinyu
Edited by | Yang Xuan
Large-scale implementation is what all embodied companies are talking about this year.
The digital race is evident in the production lines, prospectuses, and shipment volumes of embodied companies. Since April 2026, Zhiyuan Robotics announced the mass production and offline of its 10,000th robot, taking only over three months to go from 5,000 to 10,000 units. The IPO prospectus of Unitree Technology also reveals a corner of its aggressive commercialization: in 2025, its revenue reached 1.707 billion yuan, and the shipment volume exceeded 5,500 units.
Behind these aggressive figures is the global expansion of Chinese robots with "low prices and high performance." Wang Xingxing, the founder of Unitree Technology, mentioned at the 2025 World Robot Conference that in the past few years, Unitree's overseas revenue has always accounted for more than 50% of its total revenue.
Among these embodied players, MagicLab, incubated by Dreame in 2024, is the youngest one. Recently, it set a rather aggressive revenue target: to achieve a revenue scale of $14 billion in 2036.
To build its brand globally, the company recently held a press conference in Silicon Valley. On April 28, 2026 (Pacific Time), in San Jose, where companies like Adobe, TikTok, and IBM are gathered, MagicLab launched the Global Embodied Intelligence Innovation Summit (GEIS).
MagicBot Z1 of MagicLab performs for Zhang Yixing on-site. Photo by the author.
At the conference, MagicLab launched a series of new products from the underlying model:
World Model Magic-Mix: An "autonomous evolution model" independently developed by MagicLab. Magix-Mix consists of two engines: Magic-WAM, which enables robots to understand the real world, and Magic-Creator, which can generate large amounts of virtual data offline. This means that Mix can continuously and autonomously iterate in the closed loop of "data generation - model training - real-world feedback - data regeneration."
Magic-Mix architecture. Image source: MagicLab
MagicHand H01: It is equipped with 20 DOF (degrees of freedom, about 24 - 27 DOF for human hands) and 44 high-resolution 3D tactile sensors, and is mainly used for fine operations in scenarios such as industrial manufacturing and service nursing.
MagicHand H01. Image source: MagicLab
Humanoid Robot MagicBot X1: A robot with a height of 180 cm, a weight of 70 kg, 31 active DOF throughout the body, and a maximum joint torque of 450 N·m. Based on the infinite endurance dual-power system, X1 can operate continuously 24/7. The product is divided into a standard version and a research version. The former has high commercial deployment efficiency and is ready to use out of the box, while the latter is for universities, laboratories, developers, and industrial partners, supporting underlying secondary development and appearance customization.
MagicBot X1. Image source: MagicLab
At the conference, embodied brain and body companies from Silicon Valley, such as Openmind, PrismaX, and Chestnut Roborics, also appeared on-site. These companies presented different solutions regarding the brain, body, and data.
The following is a summary of the on-site discussions from "Intelligent Emergence":
Will training with machine-synthesized data yield better results than using real-world data?
The scarcity of high-quality data has always been a bottleneck in embodied model training. Currently, collecting real-machine data has problems such as high cost, long cycle, and limited scene coverage.
Machine-synthesized data is one of the solutions. However, the limitation of synthesized data lies in the lack of real information, such as friction coefficient, delay, and tactile feedback. This also causes concerns in the industry about the "sim-to-real gap."
Mixed data training is the mainstream solution proposed by Chinese and American embodied intelligence enterprises at present. For example, Gu Shitao, the president of MagicLab, introduced that MagicLab collects about 16,000 pieces of data per day and then expands the volume by 10,000 times through data synthesis. She mentioned that the new energy vehicle manufacturing industry is a rich source of data collection because of its fast product iteration and 60% - 70% of the processes relying on manual labor.
The industry consensus on whether to use real data or machine-synthesized data is based on specific training purposes and application scenarios.
Haozhi Qi, a scientist at Amazon's Frontier AI and Robotics Research Institute, mentioned that synthesized data is suitable for teaching machines single basic reaction skills, but it is difficult to enable machines to acquire long-range skills such as making breakfast. At this time, it is necessary to introduce real data for training because building a sufficiently rich simulation environment is very costly.
Zhengyi Luo, a senior research scientist at NVIDIA's GEAR Lab, revealed that the team currently uses 50% of simulation data for basic training, 15% of motion capture data and 25% of Internet video data to understand human actions. At the same time, 10% of high-quality real-world data is also added to the training. He also mentioned that some companies even use data from social media to guide the design of robot bodies.
Is VLA (Vision - Language - Action) the best solution for the embodied "brain"?
Due to its strong task generalization ability, VLA has become the most mainstream architectural paradigm for embodied models at present.
However, in fact, when a human rotates a basketball with a finger, only tactile and proprioceptive senses are needed, without vision. This means that VLA has shortcomings in these two perceptual systems.
At the GEIS conference, Haozhi Qi, a scientist at Amazon's Frontier AI and Robotics Research Institute, believed that the popularity of VLA is related to the development level of hardware sensors: currently, visual sensors are becoming mature, but tactile sensors are still in the primary development stage.
Therefore, in his view, the embodied system needs to supplement the immature sensing system through other sensory inputs to maintain the operation of the body. Therefore, VLA, which uses vision and language to compensate for tactile defects, has become one of the best solutions at present. However, in the future, as sensors and hardware develop, the algorithm will also be iterated.
The debate over the three major routes of dexterous hands: linkage, tendon, and direct drive
Currently, the core question regarding dexterous hand design is whether it should be like a human hand. Around this proposition, three design solutions have emerged: linkage, tendon, and direct drive.
Among them, the "linkage" design is the least like a human hand but has the advantages of low cost and easy control; the "tendon" design is the most like a human hand and can perform fine operations, but it has high cost and is difficult to control. The "direct drive" is a compromise solution that integrates the driving unit directly into each joint, but the cost is not low, and it still faces engineering challenges in force transmission efficiency and thermal management.
The hybrid architecture route is a recently emerging technical solution for dexterous hands. Evan Tao, the founder of Chestnut Robotics and a former core member of Tesla's Optimus dexterous hand, introduced that the team has currently chosen the hybrid architecture route, mainly using a tendon structure that can perform fine operations, supplemented by an AI control and autonomous learning system. He mentioned that future solutions "will seek a balance between flexibility and engineering reliability."
How can robots truly achieve large-scale implementation?
At the data level, introducing real-world data is still considered the key for robots to truly understand application scenarios and learn complex task operations.
For example, Zizheng Li, the CEO of XGSynBot, mentioned that their mixed data strategy still introduces a small amount of high-quality real-world data. This can control costs while improving the model's ability and generalization level.
At the system level, Zizheng Li, the CEO of XGSynBot, believes that robots need to evolve from "single-function devices" to "multi-task general platforms." For example, the robotic arm of XGSynBot has a modular system with 6 Quick-change functions. The advantage of this is that one robot can flexibly switch between different processes, increasing the wide range of implementation scenarios.
Finally, Jan Liphardt, the founder of OpenMind and an associate professor of bioengineering at Stanford University, summarized: the earlier robots enter the real world, the better.
He found that the laboratory environment cannot simulate all complex real-world scenarios, such as overly bright light, muddy and wet ground, rusty door hinges, and the load of multiple systems running simultaneously. These complex real scenarios often cause system failures when robots leave the laboratory.
Therefore, before robots are implemented, they should not only stay in the laboratory. Jan Liphardt suggested that robots should be deployed in practical scenarios such as homes, schools, airports, kindergartens, and other public places as early as possible to collect interaction data and continuously iterate.