HomeArticle

Three Questions about the "Year of Embodied Data"

具身研习社2026-03-30 16:11
Is the heap size a turning point or an expression of brute-force aesthetics?

The year 2026, hailed as the "Year Zero" of data, has become one of the few consensuses in the field of embodied intelligence. This consensus is not just an empty slogan but an inevitable outcome forced by the reality of "unworkable algorithms and non - universal scenarios" in the real - world applications of embodied intelligence in the physical world.

Interestingly, if the "Year of Mass Production" in 2025 was more like a slogan from ontology manufacturers and a performance report for the market at a certain stage, then the "Year of Data" in 2026 represents an industrial structural contradiction that ontology and model manufacturers must resolve. The deeper difference lies in the fact that in 2025, ontology enterprises still had room for error in technology iteration and production capacity ramping up even if they had not achieved mass production. However, in 2026, if manufacturers fail to develop differentiated data solutions and remain stuck at the threshold of the Scaling Law, they may gradually erode the patience of the outside world towards them.

Thus, we can observe that a competition regarding data collection devices has quietly begun. From the large - scale application of VR + handle remote operation to collect real - machine data of the ontology, to the current data collection devices without ontology such as UMI and Ego, and then to the real - machine data collection solutions based on exoskeleton remote operation, a new path has been opened up between the binary opposition of real - machine and simulation. This actually implies that the industry has not clearly answered who can gain industry recognition for data and what kind of data the industry truly needs.

From the perspective of industrial development, the competition in data collection devices has led ontology manufacturers to rush to build their own data collection teams, and model companies to flock to layout hybrid data generation engines. Everyone is trying to rewrite the "data hunger" dilemma with differentiated strategies and strengthen their own data barriers.

But can this frenzy really quench the thirst? Are the manufacturers going all - in on data just followers driven by industry anxiety, or are they the ones who have truly dissected the underlying pain points and found a way out?

Let's put aside the often - discussed data issues for now. Just looking at the recent major developments, at the end of 2025, Generalist's 270,000 - hour data made a big splash in the field of embodied intelligence. It seems that the moment of the Scaling Law has arrived, but some industry voices believe that its essence is a kind of "violent aesthetics".

After all, scale and richness are just one of the reasons for data hunger. Refinement and high quality are also necessary and sufficient conditions to solve the problem. The essence of data hunger has never been "lack of data" but "lack of useful data". This is the key to the year of embodied data, not "piling up scale" but "finding the right path".

Then, when we put aside the anxiety, the real core questions about embodied intelligence data are just emerging.

Everyone is talking about the lack of data, but what type of data is actually lacking?

To be honest, many people have not yet realized that "data serves the scenario".

Many people have a wrong view of the data pyramid. They always think that the most difficult - to - obtain but most useful data is real - machine data. In fact, the data pyramid is dynamic, not static, and depends on what problems are to be solved in what scenarios.

As Wang Xingxing, who is about to complete the "big test" of IPO, said at Siemens' first technology summit, "Our company and the industry, whenever possible, use simulation to solve problems that can be solved in the simulation environment and through simulation training because it is fast and low - cost, and some parameters can be adjusted." This is common in the walking, running, and boxing of humanoid robots. He also admitted that "for the 'operation' - related actions of robots, simulation is not good enough, and globally, real - person data collection is still used for training." Of course, real - person data collection also faces pain points such as limited environment construction and high costs.

From Wang Xingxing's words, we can actually see his choice of data, that is, always starting from the "affordance" of the data itself and getting practical feedback from scenario verification. For enterprises to know what type of data is lacking, it is not guessed by manufacturers in the laboratory but found in real scenarios.

Therefore, from the perspective of scenarios, whether it is Internet data or the massive data formed by a small amount of real - machine data + simulation, it can enable embodied intelligent robots to complete "grand - scale" actions. These data support robots to enter the physical world from the laboratory, but there is still a long way to go to achieve the productivity we desire.

Internet video data lacks force and tactile feedback, and simulation data is difficult to restore the physical characteristics of different materials (such as the softness of cloth and the smoothness of metal). Even a small amount of real - machine data mostly focuses on the movement of the ontology rather than the details of end - effector operations. However, in most productivity scenarios such as home services, logistics sorting, and industrial manufacturing, "ontology movement + a large number of end - effector operations" are required. In other words, the core requirement of data today can be focused on the end - effector, especially the dexterous hand with complex "finger skills".

Currently, it has been confirmed by Tesla and Figure, the most talked - about companies abroad, that when their robots are actually deployed, the dexterous hands are not very involved in the work. Many of them only have the form of "hands" but perform the work of "grippers".

Unfortunately, this type of data cannot be obtained through "violent collection". Different from general - scenario data, dexterous - hand data has strong scenario - specificity. For the same "grasping" action, the force - control curves for grasping glass products and rubber parts are completely different; for the same "twisting" action, the operation logics of a manual screwdriver and an electric screwdriver are fundamentally different. This means that manufacturers must delve into specific scenarios and collect "targeted data" through a series of data collection devices. This is also the core reason for its scarcity.

This data supplementation battle is essentially a transformation from "generalized data stacking" to "scenario - based in - depth exploration of refined data", and the reserve thickness of dexterous - hand data will directly determine the productivity boundary of robots.

Will there be a single solution in the data collection competition?

Currently, the industry has realized the importance of dexterous - hand data collection, and the next step is how to collect more accurate data.

At present, the competition in data collection devices has reached a certain stage. In addition to the still - expanding real - machine data collection factories, UMI, Ego, and exoskeletons have also emerged, all breaking the situation in a low - cost and high - efficiency way.

Currently, UMI mainly focuses on the operation data collection of the end of the robotic arm. Therefore, it can cover the whole - body coordinated actions of the robot, and most of them use two - finger grippers as the end to focus on gripper - related tasks, which limits its application in scenarios that require whole - body interaction.

However, for small and medium - sized enterprises focusing on single - operation tasks, UMI is still one of the optimal solutions for "balancing cost and accuracy" at the current stage. From this, the solution Ego, which solves the whole - body coordinated actions of robots, has emerged. However, Ego also faces the problem of relying on a powerful algorithm backend to complete multi - dimensional reconstruction and data alignment. Moreover, both of them tend to provide a large amount of data for pre - training in data collection, and the problem of data quality will accumulate in the later stage, which will incur high costs at this stage.

Moreover, currently, these two types of data collection devices, UMI and Ego, are gradually becoming strongly bound. For example, Luming Robotics and Jianzhi Robotics have successively launched Ego data collection devices after UMI, and they support and cooperate with each other. The industry regards the data collected by the two as complementary information sources.

However, if it only reaches this stage, it may not be able to solve the problem of dexterous operation. On the one hand, UMI is limited to the gripper form. On the other hand, although human data centered on the self like Ego has strong scalability, there is still a lack of sub - millimeter finger pose and tactile data. This makes it difficult for UMI, Ego, or even their combination to enable machines to master fine operation skills.

Therefore, we will see more new hardware for data collection related to the "dexterous end". For example, U1, the world's first Real DexUMI, recently launched by BeingBeyond, is deeply influenced by the UMI paradigm. It integrates dexterous - hand hardware, ontology interaction interfaces, dynamic tracking, and tactile perception into the same system, allowing one to naturally control another hand with one's own hand. It also includes the dexterous intelligent DexCap exoskeleton data collection system, which realizes full - dimensional dynamic capture of the human upper limb and waist. On the basis of conventional visual remote operation, it adds vibration force and tactile feedback at the hand end, providing a useful and reliable data source for the development of dexterous - hand products.

Of course, dexterous intelligence is not just a concept. As early as the first half of 2025, when the industry had not yet realized the importance of end - effector operation data, it had already carried out large - scale collection through the exoskeleton path. After nearly a year of technology iteration, when UMI and Ego are widely discussed, talking about this device is not "technology archaeology" but a witness to its necessity in cutting - edge data collection and its contribution to the practical development of the industry.

In addition, currently, there are various data collection technology routes for end - effector operations. In addition to UMI, Ego, and exoskeletons, data gloves using optical motion capture, inertial motion capture, IMU/quantum sensing/fiber optic/elastic sensing can also collect end - effector data. However, after market verification, these gloves are more suitable for medium - to - low - precision and weak - magnetic - field environments, and the post - processing cost of the data is extremely high. Under the current technological development conditions, their large - scale potential is limited.

In short, the end - game of this device competition is never about "who replaces who" but about "who can better integrate into the collaborative ecosystem". Exoskeleton devices have become the "must - have configuration" for the refined data collection of dexterous hands due to their three core advantages of force and tactile reproduction, long - term stable output, and data standardization; UMI and Ego play the role of large - scale data accumulation with their high - efficiency and low - cost characteristics; and various data gloves are also in a state of technological incubation.

It must be emphasized that devices and data are not "exclusive" most of the time. On the contrary, effective combination should be attempted.

Who should have the right to speak in data collection?

An interesting phenomenon is that on the one hand, the embodied intelligence industry chain is gradually improving. Previously, it was generally divided into two camps: ontology and brain. Now, unicorns focusing on data have emerged. For a while, everyone in the market has data and wants to be the king of the hill.

But who should have the right to speak in data?

The answer is those who understand the hardware. Whether it is ontology, model, or data manufacturers, the one who understands the hardware for data collection better has the right to speak.

Data collection is not simply "recording actions". It requires the accurate conversion of physical - world operations into usable assets in the digital world, which relies on three core supports of hardware: data dimension, data quality, and data processing cost.

The data dimension requires that the information collected is rich enough. If the data dimension is not rich enough, the model will have to "guess" a lot during training and will ultimately fail to converge. Data quality, to some extent, determines whether the model is fed with coarse or fine - grained data. If the model is "fed coarsely", it will be "rough" and unable to perform fine - grained tasks.

Finally, there is the data processing cost, also known as the "full - link processing cost", which refers to the cost of a whole set of "industrial production line" - style processes such as pipeline construction before and after data collection, personnel training, data cleaning, and algorithm mapping. These factors together determine the success of data collection.

Currently, many data collection enterprises generally do not master all the elements in these aspects. However, some manufacturers and devices have considered the "dimension, quality, and processing cost" of data.

Image source: Demonstration video of dexterous intelligent DexCap (2X Speed)

Take the dexterous intelligent DexCap system as an example. It has these three core supports. "Full - dimensional dynamic capture" covers multi - dimensional data of the hand, both arms, and the waist, which greatly increases the information richness compared to only focusing on end - effector data. Moreover, features like "kilohertz response and enhanced tactile perception" ensure a 1:1 mapping between virtual operation and real actions, which is smooth and realistic. Of course, the exoskeleton device hardware also has high durability, with no attenuation in accuracy during long - term collection, and the data quality remains stable. In addition, the device outputs data in a unified format with complete dimensions, which facilitates subsequent annotation, cleaning, and reuse, and helps build a high - quality data set that can be continuously iterated.

It must be emphasized that BeingBeyond is a well - known dexterous - hand ontology manufacturer, especially in the field of high - degree - of - freedom dexterous hands, with remarkable achievements. It is closest to the "hand" and best understands what the data for dexterous operations should look like.

Looking back, the core of the competition for the right to speak in data is the control of the "digital entrance to the physical world". Ontology manufacturers build the underlying infrastructure for data collection through lightweight, efficient, and collaborative innovation of hardware, making the large - scale production of high - quality data possible; while BeingBeyond relies on this infrastructure to transform data value into real and usable operation capabilities, forming a positive cycle of "hardware empowering data, data driving intelligence".

This is also the core logic of the embodied intelligence industry: whoever controls the "first contact point" of data collection, that is, the hardware, will have the initiative in the development of the entire industry.

Conclusion: The real meaning of the year of data

Actually, looking at the context of the year of data, the "Year of Data" does not simply mean a sudden increase in data. It means that the industry has begun to realize that data production itself is a core ability.

In the Internet era, data was often generated passively; while in the era of embodied intelligence, data must be actively created. This makes data no longer easily accessible. The reality we must face is that it is more like an industrial output deeply bound to hardware. Whoever owns the equipment, controls the deployment, and understands the scenario can stably produce high - value data.

In the world of embodied intelligence, what is truly scarce has never been the data itself but the ability to stably produce high - value data. From the case of BeingBeyond, whoever is closer to real data and establishes a stable, high - quality, and scalable data production system is more likely to gain an advantage in future competition.

This article is from the WeChat official account "Embodied Learning Community", author: Peng Kunfang, Lü Xinyi. Republished by 36Kr with permission.