The former head of Meituan Waimai's technology department starts a business to create a "restaurant world model" for the era of embodied intelligence.
The implementation of embodied intelligence is moving from the laboratory to the most real and bustling physical world.
AtomBite.AI has chosen a seemingly unglamorous but highly realistic scenario: the catering kitchen.
36Kr has learned that AtomBite.AI, an embodied intelligence company, recently completed a seed - round financing of tens of millions of yuan. The investment was led by Inno - Capital, with follow - on investments from the Tsinghua Alumni Seed Fund and well - known individual investors. The funds will be mainly used for the research and development of the embodied world model for the catering scenario and the implementation of core products.
The core team of AtomBite.AI had a long - term exploration and incubation before founding the company. This financing marks the initial verification of the project's feasibility, and it has received product cooperation and deployment intentions from several leading domestic and international companies.
The founding team of AtomBite.AI has a distinct "Meituan gene".
Dr. Wang Dong, the founder and CEO, was formerly the technical leader of Meituan's food delivery division, managing a research and development team of a thousand people and leading the construction of the algorithm, data, and system architecture for food delivery that supports tens of millions of daily orders. Co - founder Li Tao once led Meituan's food delivery algorithm and data system and is one of the few technical leaders who have truly implemented "full - link data - algorithm - driven" operations. Co - founder Li Haozhe is a serial entrepreneur with years of experience in global business implementation.
In the past few years, the catering industry has been transformed time and time again by SaaS, ordering mini - programs, and delivery scheduling systems. However, as global food delivery orders continue to rise, a long - neglected problem has become increasingly prominent: there are still a large number of physical operation links that rely heavily on manual labor between the time when a merchant prepares a meal and when a rider picks it up.
Such as packing, sealing, sorting, transfer, and delivery.
These processes may seem trivial, but they directly affect the overall fulfillment efficiency. The losses caused by wrong orders, missed orders, and spills will be transmitted to users, merchants, riders, and the platform simultaneously. At the same time, the global catering industry generally faces structural employment problems: the hourly wage in the North American fast - food industry continues to rise, while domestic catering stores have long - standing problems of difficulty in recruiting and high turnover rates.
After leaving Meituan, Wang Dong conducted market research in North America and Singapore for several months, visiting a large number of catering merchants and food delivery platforms. He finally formed a clear judgment: the catering kitchen may be one of the most certain commercial implementation directions for embodied intelligence.
The reason is that this scenario has several key characteristics.
First, it is a common global demand. Whether in China, North America, or Southeast Asia, the catering industry faces the problems of rising labor costs and fulfillment efficiency.
Second, its ROI is clear enough. As long as it can reduce the wrong - order rate, reduce manual labor, and improve meal - preparation efficiency, merchants are willing to pay for it.
More importantly, compared with scenarios such as family and elderly care that emphasize emotional interaction, the catering industry belongs to the professional service field, with a shorter decision - making chain and stronger cooperation willingness from small and medium - sized merchants.
In an interview with 36Kr, Wang Dong said that the service industry itself accounts for a large proportion of the global GDP. If a truly operable embodied solution can be established in the high - frequency scenario of the catering kitchen, achieving a systematic implementation closed - loop from the model to the application, it is of great value in itself and also has the potential to extend to more complex scenarios such as home kitchens in the future.
(Image source/Enterprise)
Compared with many companies that prioritize the development of a "general embodied world model", AtomBite.AI prefers to continuously learn from real scenarios to gradually build model capabilities.
Wang Dong said, "The locomotion ability has basically been solved after seven or eight years of development. Now, the real focus of the industry has shifted to fine - grained operations. Although dexterous hands are still some way from being fully mature, there are already a large number of mature engineering solutions for two - finger and three - finger grippers, which can support the implementation of some standardized tasks."
Based on this judgment, AtomBite.AI does not focus on reinventing robot hardware but aims to develop a "World Action Model (WAM)" for the catering scenario.
In Wang Dong's view, the VLA (Vision - Language - Action) approach relies too much on the language module for high - level planning but lacks sufficient visual representation. In the real world, action control does not essentially depend on language. "The real action control path of humans does not rely that strongly on language. The more core issues are visual understanding, physical understanding, and how to establish a mapping between actions and the real world."
Based on this judgment, AtomBite.AI emphasizes the exploration of the "VT - WAM" (Vision - Touch World Action Model), which integrates vision and touch at the model level. Wang Dong explained, "Vision can see objects but cannot see contact; touch cannot see the whole picture but can see success or failure. Vision represents the geometric aspect of the world, and touch represents the physical aspect. VT - WAM then integrates these two types of information through the latent space into a 'world - action model' that can predict the consequences of contact."
The world model not only requires visual perception ability but, more importantly, an understanding of the laws and causal relationships in the real physical world. He gave an example: whether a beverage cup is filled with water, how full it is, and whether it is cold or hot will all affect the friction, center - of - gravity change, and operation stability when a robot grabs it.
AtomBite.AI hopes to use multiple sensors such as vision and touch to collaboratively perceive the state of objects and embed causal understanding of physical properties such as liquid sloshing and center - of - gravity change in the model, so that the robot's actions are not just based on data fitting but truly conform to the physical laws of the real world, thereby improving the stability and precision of grasping and operation.
From a technical architecture perspective, AtomBite's system is roughly divided into three layers: the top layer is the embodied world model, which is used to form a perception of the kitchen environment and complete decision - making and action planning; the middle layer is the task orchestration and scheduling engine, which converts the perception results into specific execution plans and uniformly schedules different devices; the bottom layer is the integration of self - developed core components and general hardware bodies to ensure that the system can operate stably in a real kitchen for a long time.
The core logic behind this architecture is: instead of building a general robot first and then looking for application scenarios, it continuously collects real interaction data in a high - frequency and pain - point scenario, and then feeds the world model, making the model more and more "intelligent" in the physical world.
The catering kitchen repeats a large number of high - frequency operations every day - packing, sorting, handling, cooking, and transfer. These actions can naturally generate a large amount of diverse real - world data, which is difficult to generate solely through a simulation environment.
(Image source/Enterprise)
In terms of the specific implementation path, AtomBite.AI currently chooses to start with "food delivery packing and transfer". This is the link with the highest error rate, relatively high standardization, and the easiest to quantify the value in the entire fulfillment chain.
"The task scope of the packing link is clear, and the scenario is controllable. Our approach is based on commercial value. We first focus on improving the grasping accuracy to make it usable and reliable," Wang Dong said. "From a commercial perspective, merchants don't care whether your robot looks like a human or can dance. They care more about what kind of work you can actually do for them."
Currently, AtomBite's solution is to assign high - frequency, standardized actions to the lightweight small models on the edge side, such as packing set meals into boxes and sealing labels, to reduce latency and network dependence. The large cloud - based model is mainly used to handle abnormal situations, such as material shortages and foreign - object interference in complex scenarios, and to coordinate with the kitchen staff through the KDS system.
The packing - link model is expected to be deployed on a large scale in real merchant kitchens in 2026.
In the team's vision, in the future, the model's capabilities will start from the single - point packing link and continue to extend to more complex kitchen operations, including sorting, delivery transfer, cooking coordination, and even gradually enter a wider range of service - industry scenarios.