Irrelevant to the subject: Generalist wants to turn over the table at the real - machine collection site after 270,000 hours.
The key watershed in data competitions is no longer the debate over data solution routes, but rather whether to return to the "first - principles" of data collection: the pursuit of scalable, reusable, and evolvable large - scale data streams. Those traditional remote - operation models that are fixated on single entities and high - cost annotation not only struggle to support the data deluge required by the Scaling Law but also fundamentally deviate from the basic logic of intelligent generalization.
On November 4, 2025, Generalist AI, a robotics company in Silicon Valley, USA, announced a shocking piece of news to the industry: Their GEN - 0 embodied foundation model completed training on 270,000 hours of human operation video data, verifying the existence of the Scaling Law in the robotics field for the first time. This is hailed as the "ChatGPT moment" of embodied intelligence in the industry.
Image source: Generalist
What does 270,000 hours mean?
This data volume far exceeds all currently publicly available datasets for entity robots and is still growing at a rate of 10,000 hours per week. In sharp contrast, the real - machine remote - operation data collection model, once regarded as the "pinnacle," has reached an insurmountable bottleneck in efficiency. Its slow accumulation rate makes it impossible to meet the exponential data scale requirements of the Scaling Laws.
The collection of real - machine remote - operation data is essentially a linear accumulation process limited by the physical world. Its typical model involves establishing an offline data factory around specific robot hardware, where operators demonstrate tasks by remotely operating real robots. Several inherent characteristics of this model determine that it is difficult to keep up with the pace of the Scaling Law:
Opposition between linear growth and exponential demand: The Scaling Law reveals that model performance improves according to a power - law relationship with the data scale, which means that the data volume needs to expand exponentially. However, real - machine remote - operation data collection heavily relies on "increasing manpower" and real - machine operation, resulting in linear growth. The generation of each data point is accompanied by real hardware wear, physical movement time, and labor costs. Even if a collection base with hundreds of people is established, the annual data output often remains at the level of tens of thousands of hours, far from the "data deluge" required by the Scaling Law.
The "anchoring effect" of physical hardware: The complex processes of deploying, debugging, and maintaining real robots make the data collection system rigid and cumbersome, unable to achieve flexible and rapid large - scale expansion. The data accumulation rate is firmly locked by the capabilities and availability of physical hardware. A practitioner frankly stated, "The data production capacity ceiling of the physical factory we built with all our efforts is clearly visible. This model cannot support us in moving towards a scaled model."
Spending exorbitant costs on large - scale data collection can only result in a dataset in the millions. "Even if we open - source the dataset we painstakingly created, it would be a drop in the bucket for the industry's predicament," a practitioner in embodied intelligence once told the Embodied Intelligence Research Society.
It is evident that although real - machine remote - operation data is of higher quality, we still need to explore a path to solve the problem of data scaling. While waiting for real - machine data to unlock large - scale growth, the Generalist solution represents an alternative approach.
It is true that there is no right or wrong in technical routes; the key lies in whether the development path can adapt to the AI scaling law. However, at this point, a seemingly unsolvable problem presents itself: How to break through the scale bottleneck of data collection?
How can this problem be solved?
To solve the problem, first ask what the robot needs
The first - principles for solving the problem should return to the "language" of embodied robots. The core proposition of the industry has never been to blindly expand the market scale and pursue a superficial "bigger pie" but to listen attentively to the "real needs" of embodied robots: What kind of scenario environment, technical support, and data nourishment does it need to truly transform from a "technological exhibit" to an "industrial tool"?
The realization of the value of embodied robots lies in the deep - seated logic of "putting them to use." That is, scenario applications must simultaneously meet the triple requirements of necessity, long - term effectiveness, and scale economy. These three elements form the underlying support for industrial implementation: Necessity is the prerequisite for the existence of a scenario, pointing to the core pain points that the industry has not yet addressed; long - term effectiveness determines the sustainability of value, avoiding short - term gimmick - like applications; and scale economy is the key to industrial scaling, supporting the positive cycle of technological iteration and the business closed - loop.
The frequently occurring performance and demonstration scenarios in the current industry are essentially just "scenario slices" in the early stage of commercialization. Although these applications can intuitively showcase technological progress and attract market attention, they are far from the complete picture of industrial implementation. The real direction for the implementation of embodied robots is to become the "collaborative partners" of human labor:
On the one hand, they can free humans from repetitive labor and low - value and cumbersome tasks. On the other hand, they can undertake high - risk and high - load work scenarios. Ultimately, they can be deeply integrated into core industrial scenarios such as factory production, commercial services, and special operations, achieving a leap in labor efficiency and an upgrade of the production model.
The implementation of core industrial scenarios cannot be supported by the performance model on the stage that relies on pre - set programs to complete standardized actions. It requires embodied robots to break free from the shackles of "action replication" and deeply understand the internal structure and dynamic operation trajectory of the physical world. This includes core propositions such as real - time adaptation to environmental variables, accurate perception of object attributes, and fault - tolerance boundaries for task execution.
In other words, embodied robots not only need to "be able to do" but also "understand how to do": They need to clarify the standards for "doing things right" in different scenarios, understand the logical relationships behind actions, rather than mechanically executing pre - set instructions.
This "understanding how to do" ability is essentially a systematic decomposition, reproduction, and optimization of human behavior patterns. Compared with large - scale macroscopic actions such as limb swinging, in long - term industrial scenarios, the core difficulties are concentrated on fine - grained interaction capabilities such as tactile feedback, force - control accuracy, and environmental perception.
Fei - Fei Li, the "Godmother of AI," deeply analyzed this problem in her latest published manifesto on spatial intelligence. She pointed out that spatial intelligence plays a fundamental role in human interaction with the physical world - we rely on it every day to complete various seemingly ordinary actions: When parking, we judge the position by imagining the gradually decreasing distance between the front of the car and the curb; we catch a key thrown from the other end of the room; or we pour coffee into a cup without looking when half - asleep.
Image source: Screenshot of the A16Z account
However, enabling robots to master this ability faces severe challenges. Fei - Fei Li clearly stated, "A core challenge in developing these robots is the lack of training data suitable for various embodied forms."
This means that robots need to master more detailed physical interaction data: How to deal with the rebound of the keyboard when typing? When picking up a bottle of mineral water, since it is not a pure rigid body and will deform slightly, how much force is needed to unscrew the cap? Sufficient and high - quality fine - grained data are the "nourishment" for embodied robots to accurately execute tasks. This part of the data that is difficult for humans to articulate has become an important pain point restricting their large - scale application.
Without a perfect data closed - loop for feeding, their interaction and execution are prone to getting out of control, which is also the root cause of many "implementation trial - and - error cases" in the industry. The "dark history" of embodied robots circulating on social media is essentially a direct manifestation of the lack of fine - grained capabilities: When unscrewing a bottle cap, due to the lack of accurate force - control ability for different materials and different tightening degrees, the bottle is crushed due to imbalanced force; when building blocks, due to the lack of accurate perception of the spatial position and dynamic collision of objects, a whole row of blocks is accidentally knocked down; in industrial assembly, due to the lack of tactile feedback processing ability for detailed parts, problems such as part damage or misassembly occur.
These seemingly trivial mistakes precisely expose the core shortcoming of the industry: The lack of fine - grained capabilities makes it difficult for embodied robots to handle the complexity and uncertainty of real scenarios. The core cause of this capability shortfall lies in the lack of training data that can simultaneously meet the requirements of physical authenticity and large - scale production. When the industry is trapped by this lack of core capabilities, any increase in orders and shipments on paper is difficult to translate into actual large - scale application implementation. The real turning point of the industry will surely start from a fundamental breakthrough in the data supply required for cultivating core capabilities.
The real machine is not a panacea; large - scale data touches the Scaling law
After clarifying that fine - grained interaction ability is the core bottleneck for the implementation of embodied robots, it is necessary to further examine the data system structure that supports this ability. The industry has long recognized the "data pyramid" as the rating standard.
This pyramid is divided into three layers: The bottom layer is composed of a large amount of publicly available Internet data and human operation video data, the middle layer is simulation - synthesized data, and the top layer is real - machine remote - operation data with the highest value density.
Currently, the data that can truly enable embodied robots to deeply interact with the physical world and execute work tasks mainly rely on the real - machine remote - operation data in the middle layer of the pyramid and simulation - synthesized data with physical parameters.
In terms of real - machine remote - operation data, it is obtained through actual measurements of embodied robots in real industrial scenarios and covers fine - grained data such as tactile feedback, force - control parameters, and environmental interaction dynamics. In short, real - machine remote - operation data is like "teaching one - on - one" an embodied robot how to work. Through a remote - operation collection field with a scale of hundreds of people, data annotation is carried out around a single entity form. The success rate in a single work task is relatively high, and each motion trajectory bears the mark of human operation.
The core value of real - machine remote - operation data lies in its high - fidelity recording of the real physical world. Complex physical interactions in the real environment, such as contact dynamics, friction changes, object deformation, and force feedback, are all fully captured in real - machine remote - operation data. These physical details from the real world - especially non - linear dynamic parameters such as contact and friction - can provide robots with the most direct and real interaction experience in the physical world, which is the fundamental reason why real - machine remote - operation data is regarded as the "pinnacle" of the data pyramid.
However, it is precisely because of its collection method that real - machine remote - operation data has some pain points.
Currently, the forms of embodied robots in the industry have not yet converged. Even embodied robots of the same height may have different arm lengths and naturally different motion trajectories. This makes it difficult to deploy data collection across different forms. When the robot entity is iterated or customer requirements change, the previously collected data assets are difficult to reuse, resulting in a data collection model driven by "selling entities" rather than a data - driven large - scale model.
Secondly, data collection often consumes a large amount of human and material resources. Few enterprises can bear the financial pressure. Most data collectors are part - time workers, and in some cases, the entire scenario data collection is outsourced to a third - party company, which to some extent affects the quality of data collection.
It is evident that many objective factors make it difficult for real - machine remote - operation data to touch the Scaling Law. The Scaling Law, which states that model performance predictably improves with the increase in data volume and computing power, is the primary solution for the data side of embodied robots.
The breakthrough of Generalist AI precisely verifies the possibility of large - scale data. The GEN - 0 embodied foundation model released by Generalist verified the existence of the Scaling Law in the robotics field for the first time using 270,000 hours of human operation video data. More importantly, Generalist adopted the UMI (Universal Manipulation Interface) solution, which decouples the data collection device from the robot entity and can be flexibly deployed in thousands of households, warehouses, and workplaces around the world, achieving true large - scale data collection.
Image source: Generalist
On another path to data scaling, simulation - synthesized data also shows the potential to touch the Scaling Law and has more advantages in terms of economic efficiency. A single set of simulation scenario assets can be adapted to different forms of robots for training without the need to rebuild the environment for each entity.
More importantly, simulation data can quickly generate a large amount of diverse training data in a virtual environment, with unique advantages in cost control and deployment flexibility. In the field of embodied intelligence, where the pre - training dataset is almost non - existent - there are not millions of robots continuously collecting data in factories, workshops, and households - this huge data gap precisely needs a solution that can be quickly expanded and has controllable costs, such as simulation - synthesized data, to fill it.
On the one hand, simulation - synthesized data can solve the pain points of data shortage and difficulty in scaling. On the other hand, the simulation method can significantly reduce the cost of data asset precipitation. Combining these two aspects can open the door for simulation - synthesized data to assist embodied robots in accessing a large amount of data.
More importantly, simulation - synthesized data precisely covers the demand for fine - grained data and has generalization ability. The simulation environment can accurately simulate fine - grained parameters that are difficult to capture in real - machine actual measurements, such as tactile feedback and force - control thresholds. At the same time, by adjusting scenario variables (such as object materials, environmental lighting, and task processes), it can generate data with scenario generalization ability, helping robot algorithms adapt to more diverse real scenarios.
The commercial value of simulation - synthesized data has been verified by actual cases. Galaxy General adheres to simulation technology as the core R & D path and successfully launched the "Galaxy Space Capsule," which has been widely deployed across the country. Through in - depth interaction with every customer, it has proven the huge potential of the simulation route in commercial transformation. This implementation is not accidental but an inevitable result of the deep - seated match between the advantages of simulation data and industrial requirements. The large amount of data accumulated through simulation supports robots to achieve more stable and accurate execution in real scenarios, paving the way for commercial scaling.
From the perspective of industrial development laws, both the verification of the Scaling Law by Generalist using 270,000 hours of human operation video data and the scaling potential demonstrated by simulation - synthesized data point to the same core proposition: How to efficiently obtain a large amount of high - quality training data.
The industry should adopt an objective and prudent attitude and return to the core logic of "demand - driven." Achieving data scaling is the key at present. Enterprises still building remote - operation collection fields around a single entity are essentially packaging a "selling entity" business under the guise of data collection, and their data is difficult to gain an advantage in the competition of the Scaling Law.
Simulation is also a viable path: The co - evolution of physical authenticity and scale efficiency
Generalist's verification of the Scaling Law in the robotics field using human operation video data coincides with the data logic of simulation - synthesized data - both aim to break through the physical limitations of data collection and achieve high reusability and large - scale efficiency. However, Generalist achieves cross - entity data collection in the real world through the UMI solution, while simulation - synthesized data chooses to build a data pipeline in a virtual environment.
It is worth noting that simulation - synthesized data is showing a momentum no less than that of human operation video data in terms of scaling potential. Fei - Fei Li, the "Godmother of AI," pointed out in her long - form article "From Language