HomeArticle

Embodied AI: Data Sellers Profit First

猎云网2026-06-16 16:24
2026 has already become the "First Year of Large-Scale Embodied Intelligence Data".

Another business of "selling shovels" has started to make money.

The "Hundred Models Battle", which became popular in 2023, has allowed hardware providers of computing power (the "shovel sellers") to reap huge profits. A similar situation is now playing out in the embodied intelligence industry. Data collection practitioners are securing intensive financing and a large number of orders:

In March, Guanglun Intelligence completed 1 billion yuan in financing, becoming the world's first unicorn in embodied data. It also revealed that it had won orders worth 550 million yuan in the first quarter of this year. In April, Wuwen Zhike completed financing of over 100 million yuan and disclosed that the order amount signed in the first quarter of this year reached hundreds of millions of yuan. In April, Yiren Technology completed two consecutive rounds of financing in the Pre - A+ and Pre - A++ rounds, each worth hundreds of millions of yuan. At the same time, it announced that its revenue would exceed 100 million yuan and turn profitable in 2025, and that the embodied orders in Q1 2026 would exceed the total annual revenue of last year. Zhiyuan has also spun off and established Mifeng Technology...

It's not just startups that have set their sights on this lucrative market. Internet giants are also getting in on the action. JD.com has released a full - chain infrastructure for embodied intelligence data and plans to launch a crowdsourcing data collection project involving 600,000 people (such as couriers and delivery riders wearing devices), aiming to accumulate 10 million hours of human real - world scenario video data within two years. Baidu has chosen the "data supermarket" model...

The soaring popularity of the industry is the underlying logic behind what Yao Maoqing, the chairman and CEO of Mifeng Technology, said: "Before the large - scale commercialization of embodied intelligence, as an infrastructure, data will generate commercial returns earlier than terminal applications."

The data sources for embodied intelligence mainly fall into four categories: At the top of the pyramid are the "real - machine data" obtained from remotely controlled real machines (i.e., teleoperation). This type of data has the highest accuracy but also comes with the highest cost. However, it is the key to the implementation of humanoid robots. Simulation data/synthetic data are in the middle layer. They are low - cost and scalable, which can make up for the current shortage of real - machine data. Human videos, including Internet videos and human behavior data, are at the bottom of the pyramid. They have a wide range of sources and strong generalization ability. The UMI (Universal Manipulation Interface) protocol is a low - cost, body - less data collection paradigm and technical standard in the field of embodied intelligence.

Source: ZuoSi Auto Research, "Research Report on the Data Industry Layout of Embodied Intelligent Robots in 2026"

As the industry has developed, the data collection track can be roughly divided into four major schools: For "real - machine data", there are leading robot companies like Zhiyuan, which operate in a closed - loop of "body + data", and the data business is a natural extension of their internal capabilities. For simulation data/synthetic data, there are startups positioned as data infrastructure service providers, such as Guanglun Intelligence. Cross - border platform giants like JD.com and China Mobile enter the market relying on their advantages in industrial scenarios and adopt a hybrid collection model. "UMI - type companies" like Luming Robotics and Songling Robotics focus on providing standardized and modular collection hardware.

It's not hard to see that 2026 has become the "Year of Large - scale Embodied Intelligence Data". Various manufacturers are seizing opportunities in the embodied intelligence industry by positioning themselves as "data service providers", leveraging the rigid demand, high barriers, and replicable business models.

A Gap of Over 99% Gives Rise to New Unicorns in "Synthetic Data", and 3 Companies Have Won Hundreds of Millions in Orders

The training of large - scale embodied intelligence models (VLA/World Models) requires a massive amount of multi - modal, high - fidelity physical interaction data. However, as of early 2026, the total amount of high - quality real physical interaction data globally was only about 500,000 hours. The industry consensus is that training a general embodied model requires at least tens of millions of hours of data, resulting in a gap of over 99%.

This imbalance between supply and demand has made data a scarce resource, and buyers are in a state of "buying as much as available". Therefore, data collection has become a new type of "shovel" in the sub - track of the embodied intelligence industry, once again confirming the logic of "data first, shovel sellers get rich first".

It should be noted that star startups such as Guanglun Intelligence, Wuwen Zhike, and Yiren Technology were not all established during the development period of the embodied intelligence industry. Most of them were founded during the wave of intelligent driving and chose the "simulation data/synthetic data" route, gradually expanding to "real - machine data" and embarking on a new path of integrated development.

In 2023, with the breakthroughs in large - language models (LLM) and vision - language models (VLM), the industry began to generally explore endowing robots with "brains", achieving a leap from traditional automation to "embodied intelligence" with perception and decision - making capabilities. In February of this year, Zhiyuan was established and quickly launched its first humanoid robot, which attracted wide attention from the capital market and the technology community and was regarded as a representative event of China's embodied intelligence entrepreneurship wave.

Guanglun Intelligence, founded in January 2023, is positioned as a synthetic data company, dedicated to providing synthetic data solutions for enterprises to implement AI. In March 2026, the company officially announced the completion of 1 billion yuan in Series A++ and A+++ rounds of financing. After this round of financing, it became the world's first unicorn enterprise in the field of embodied data. In May, Guanglun Intelligence received a new round of financing led by Ant Group, and its post - investment valuation exceeded $2 billion, doubling its valuation in just two months.

Source: Tianyancha

Although Wuwen Zhike was founded in November 2022, it officially started operations in May 2023, six months later. In its official website introduction, Wuwen Zhike mentions intelligent driving multiple times. It mainly applies AI - driven large - model simulation technology to ensure the safety of intelligent driving vehicles on the road.

In 2024, the popularity of the embodied intelligence industry continued to rise. Among the star unicorns, Zhiyuan and Yushu Technology both received two rounds of investment in this year.

Riding on the wave of embodied intelligence, when Wuwen Zhike officially announced its angel - round financing in August 2024, it appropriately mentioned that "the company is based in and deeply involved in the intelligent driving/autonomous driving track and will gradually expand to the robot and embodied intelligence tracks". In April 2026, when Wuwen Zhike officially announced new financing of over 100 million yuan, the company's positioning had been updated to a "physical AI data base enterprise".

According to reports, based on the Yangtze River Delta (Deqing) Embodied Intelligence Data Collection Training Ground, the first in the country to integrate virtual and real worlds in a closed - loop, Wuwen Zhike can produce thousands of hours of data per day, with the ability to generate tens of thousands of large - scale synthetic data and conduct tens of millions of simulation verifications. Its long - term cooperation customers include leading enterprises such as Xingdong Jiyuan, Tashi Zhihang, Lingxin Qiaoshou, and Lingcifang. In Q1 2026, it signed orders with ByteDance, Wujie Power, Zhangyu Power, etc. Currently, it has orders worth hundreds of millions of yuan on hand, and its revenue this year will exceed 100 million yuan.

Guanglun Intelligence has integrated human data and simulation into a closed - loop infrastructure, and its human data delivery volume ranks first in the world. Its human video data products cover more than 25,000 environmental nodes and over 100,000 task types, with a cumulative delivery of over 1.5 million hours of high - quality human data. In 2025, the company's annual revenue increased by 10 times. In April, it officially announced that its expected single - quarter revenue in Q1 2026 would exceed the total revenue of 2025. In May, it announced that it had received new orders worth 550 million yuan in Q1 2026.

An even more typical example is Yiren Technology, founded in March 2013. Leveraging the vehicle perception network accumulated in autonomous driving, it timely transformed to collect embodied intelligence data. In 2025, its AI data business revenue exceeded 100 million yuan, becoming the first in China to achieve positive profit in AI data. At the same time, the company has implemented multi - scenario applications in the field of embodied intelligence and won orders from leading customers. In the first quarter of 2026 alone, it received over 100 million yuan in new orders for embodied intelligence data.

The prospectus of Yushu Technology, founded in 2016 and already approved for listing, shows that in 2022, the company took six years to achieve an operating revenue of about 123 million yuan, but it only "turned losses into profits" in 2024, achieving a net profit of about 95.4747 million yuan.

In contrast, startups like Guanglun Intelligence and Wuwen Zhike have achieved over 100 million yuan in revenue in 2026, just three years after their establishment, which is one of the examples of "data sellers making money first".

Zhiyuan Enters the Fray, and JD.com, Baidu, and China Mobile Follow Suit

With the gradual maturity of hardware bodies, high - quality data has been widely recognized by the industry and academia as the core element to cross the gap of general fine - grained operations. How to obtain multi - modal data with physical authenticity at a low cost and on a large scale has become the deciding factor for the commercial implementation of embodied intelligence in the next five years.

Yao Maoqing, a partner, senior vice - president, and president of the embodied business department at Zhiyuan, who understands this well, led the construction of the largest (4,000 square meters) and most scene - rich data collection super - factory in the industry as early as May 2004. By deploying nearly a hundred Expedition A2 - D special machines, it can achieve rapid collection of thousands of data items per machine per day, and is also a representative player in the current "real - machine data" field.

Just six months later, Zhiyuan, in collaboration with the Shanghai Artificial Intelligence Laboratory, the National - Local Joint Innovation Center for Humanoid Robots, and Shanghai Kupasi, jointly open - sourced the world's first million - scale real - machine dataset AgiBot World based on full - domain real scenarios. Relying on this, Zhiyuan, which has chosen a strategic closed - loop of heavy - coupling of "body - data - model - scenario", was selected as one of the top three in China's embodied intelligence data track in April 2026, along with the independent data provider Guanglun Intelligence and the national - level public platform, the National - Local Joint Innovation Center for Humanoid Robots.

Source: ZuoSi Auto Research, "Research Report on the Data Industry Layout of Embodied Intelligent Robots in 2026"

Yao Maoqing has emphasized more than once that the current bottleneck in the robot industry is not computing power but data. "High - quality real - machine data is the key prerequisite for the emergence of intelligence." He also pointed out that there is a large amount of simulation data in the current industry, but simulation data cannot replace the fine - grained perception information generated in real physical interactions. Zhiyuan's strategy is to "focus on real - machine data and supplement with simulation data". Only the data collected in real scenarios can truly drive a qualitative change in robot intelligence. The company also has a clear quantitative goal: to accumulate tens of millions of hours of real - world scenario data within two years.

The support of Mifeng Technology's data collection system is also crucial for achieving the above goals. In February 2026, Yao Maoqing promoted the spin - off of Zhiyuan's business to establish Mifeng Technology and served as its chairman and CEO. The company focuses on the track of embodied intelligence data infrastructure, deeply applies and promotes UMI technology (but is not a single "UMI - type company"), and builds an independent and open one - stop physical AI data service platform. Just ten days after its establishment, Mifeng Technology completed hundreds of millions of yuan in seed - round and angel - round financing.

According to Pengpai Technology, the current overall pricing range for embodied intelligence data is between 200 - 500 yuan per hour. Among them, real - machine data collected through actual operations of robots in real - world scenarios is in the highest demand and the most expensive because it is most suitable for training implementation models. The current domestic market price is between 500 - 1000 yuan per hour. According to Yao Maoqing, as production capacity gradually stabilizes, the price of body - less data that does not depend on a specific robot body will eventually converge to one - half to one - third of that of real - machine data. For example, if real - machine data is sold at 1000 yuan per hour, the price of body - less data may stabilize at 300 - 400 yuan in the future.

Due to the scarcity of data and high prices, in addition to embodied intelligence players, sensitive Internet giants and industrial titans have acted in unison this year. They have entered the market, targeting the "lucrative" data collection field and strongly integrating into the embodied intelligence industry chain.

Among them, Internet giants like Baidu have adopted the "data supermarket" model. On April 10th, Baidu Smart Cloud, in collaboration with multiple embodied intelligence enterprises such as Lingcifang, Lingsheng, Fourier, Weiti Technology, Tuoyuan Wisdom, Shutu Technology, and Songying Technology, launched the "Embodied Intelligence Data Supermarket (Beta version)", pioneering a hierarchical and scalable data labeling system to accelerate the large - scale implementation of embodied intelligence.

It is worth mentioning that although mainly based on "UMI protocol" data, Luming Robotics also chose the "data supermarket" model. In March 2026, the company launched the industry's first "FastUMI Pro Data Supermarket".

With the geometric expansion of data requirements from large models, a single technical route can no longer meet the strict requirements of "scale, cost, accuracy, and generalization". The industry is fully moving towards the era of multi - source integrated collection: injecting general physical common sense through human videos, covering long - tail boundaries with a large amount of synthetic simulation data, expanding real - world interaction actions through distributed UMI collection, and finally relying on high - precision teleoperation for expert - level fine - tuning in vertical scenarios.

Taking JD.com as an example, on March 16th this year, it announced the construction of an embodied intelligence data collection center, planning to cover five core scenarios: logistics warehousing, industrial manufacturing, healthcare, home services, and urban operation and maintenance, recording multi - dimensional data such as vision, touch, and spatial trajectories. It will mobilize hundreds of thousands of people, including over 100,000 internal employees and up to 500,000 external personnel, to participate in data collection. It plans to accumulate 5 million hours of human real - world scenario video data within one year, exceed 10 million hours in total within two years, and simultaneously collect 1 million hours of robot body data.

It is reported that JD.com's embodied intelligence data collection center mainly adopts the method of collecting real - world scenario data from the human first - person perspective (Egocentric) based on wearable devices and