Hard Krypton Exclusive | Former Xiaomi executive Tang Mu launches coffee robot startup, completes financing of hundreds of millions, with investments from Lin Bin and Li Wanqiang
Autor | Qiu Xiaofen
Redakteur | Yuan Silai
Hard Kr has learned that the general catering embodied robot company "Yingzhi XBOT" has completed two rounds of financing worth hundreds of millions of yuan in a row. Among them, the Series A financing of 200 million yuan was invested by Hong Kong's Jiankun Capital GPTX, and the Series B financing was 300 - 500 million yuan, jointly invested by multiple government funds, US dollar funds, and industrial investors.
This is one of the largest financings in the catering vertical robot field so far.
Before this, "Yingzhi XBOT" also completed a round of angel financing, with a luxurious lineup of investors, including Zhang Xiaolong, the senior vice - president of Tencent, Li Wanqiang, Huang Jiangji, Lin Bin, and Hong Feng, the co - founders of Xiaomi, and Guo Yike, the chief vice - president of the Hong Kong University of Science and Technology.
"Yingzhi XBOT" was founded in 2022. Its founder, Tang Mu, can be regarded as an "outsider" in the robot circle. He is a rare product manager - turned - CEO in the robot industry.
(Quelle/Betrieb)
Previously, he served as the general manager of Kingsoft Software and Tencent CDC, responsible for the experience design of products such as QQ and QZone. Later, he also served as the vice - president of the Xiaomi Ecosystem Chain, leading the launch of benchmark best - selling products with tens of millions of shipments, such as the Xiaomi Router and the Xiaomi Smart Speaker. He has accumulated 25 years of product - making experience.
However, while the entire embodied intelligence track is going crazy for the narrative of humanoid robots entering households and factories, "Yingzhi XBOT" has chosen a path that seems less "cool" but can be quickly productized and commercially implemented - letting robots make coffee in the corners of shopping malls.
Different from most mainstream humanoid robot companies, the technical architecture of "Yingzhi XBOT" is designed around catering vertical scenarios, with the "XOS 3.0 Embodied Operating System" at its core, adopting an architecture of one brain with multiple forms.
According to Tang Mu, the system is divided into three layers:
Brain layer: Responsible for high - level cognition and task planning, it is equipped with the Zhiwei Catering Large Model, trained based on the DeepSeek base, and integrated with the real production data of 4 million cups of coffee.
Cerebellum layer: Responsible for converting semantic instructions into joint - level control signals, with a response time of less than 10 milliseconds. It has accumulated more than 50 atomic skills for catering actions.
Ontology adaptation layer: It realizes cross - form reuse, supporting "once developed, reused in all forms". The control logic of the coffee arm can be migrated to other robots of "Yingzhi XBOT", such as ice - cream and cocktail - making robots.
(Quelle/Betrieb)
This architecture helps robots avoid the current industry pain point of the difficult generalization of the VLA (Vision - Language - Action) model.
Tang Mu pointed out to Hard Kr that the industry generally believes that the VLA has poor generalization ability because people try to let one model complete completely different tasks such as folding clothes and cooking at the same time. The strategy of "Yingzhi XBOT" is to limit the scenarios and only call the VLA model as a backup in case of abnormal situations, so as to save computing power and ensure stability.
In addition, "Yingzhi XBOT" has also built a complete data flywheel.
This system conducts simulation pre - training in the laboratory and online reinforcement learning in real - world commercial scenarios, and continuously optimizes based on the real operation data of more than 4 million cups of coffee. At the same time, through RLHF (Reinforcement Learning from Human Feedback), it aligns with more than 3 million times of human preference data to continuously improve the model's performance.
After building the infrastructure base of the model and data, "Yingzhi XBOT" recently launched four product lines to cover different levels of catering embodied needs.
XBOT C3 Coffee Robot: It covers an area of 1.83 square meters, uses a 6 - axis robotic arm, with a repeated positioning accuracy of ±0.02 mm. It can produce 80 cups of coffee per hour and can continuously make 150 cups when fully charged. It is equipped with a 43 - inch digital human screen and the "Aibao Store Manager" Agent, and is priced at 219,000 yuan.
XBOT I3 Ice - cream Robot: It has a smaller volume, can produce more than 60 cups per hour, with a production failure rate of less than five - thousandths. It has a designed service life of 250,000 cups, supports the addition of 4 kinds of sauces and 4 kinds of crispy particles, and is priced at 179,000 yuan.
XBOT X1 General Catering Humanoid Robot: It is equipped with a 7 - axis double - arm, with a double - arm coordination accuracy of ≤±1 mm. It is equipped with a single Huixi R1 chip, providing 500 TOPS of local computing power. It is planned to be mass - produced by the end of 2026, and can realize a full - link closed - loop of picking up, making, placing, and delivering. It is expected to be launched by the end of 2026.
XBOT CUBE Robot Food Truck: It is equipped with a 20 - degree battery, covers an area of 8 square meters, supports functions such as coffee, ice - cream, cocktail - making, and sausage - roasting. The launch time is to be determined.
Tang Mu judged to Hard Kr that since humanoid and semi - humanoid robots are not the optimal solution for all scenarios, in the future, dedicated single - arm models and general semi - humanoid models will develop in parallel.
To ensure future production capacity, currently, the annual production capacity of the bases of "Yingzhi XBOT" in Nanjing, Yueyang, Shanghai, and Beijing Yizhuang reaches 20,000 units.
However, technology must serve the business closed - loop. Tang Mu used the analogy of "driving a Didi" to describe the business model of "Yingzhi XBOT".
According to his introduction, taking the Lite series coffee robot placed in a shopping mall in Yiwu, Zhejiang as an example, the equipment is priced at more than 100,000 yuan, with an average daily output of about 200 cups, an average customer price of about 20 yuan, and a monthly cumulative revenue of more than 60,000 yuan, with a net profit of more than 30,000 yuan. Calculated, the pay - back period is only 6 to 8 months. The designed service life of the machine is 5 years. After deducting the pay - back period, the remaining time is basically the pure - profit period.
(Quelle/Betrieb)
Tang Mu used Didi as an analogy: Didi drivers spend more than 100,000 yuan to buy a car and have to be tied to the vehicle all day, while a robot of the same price only needs to be refilled and maintained once a day. From a business logic perspective, the coffee robot is essentially a more efficient means of production.
Based on this, XBOT does not adopt the common leasing model in the industry. Tang Mu believes that leasing is not suitable for the robot era. Therefore, "Yingzhi XBOT" promotes the RaaS model (Robot as a Service).
According to Tang Mu's introduction, in the future, after franchisees pay the machine cost, they also need to pay three fees monthly, including the material subscription fee (coffee beans, dairy products, etc.), the Agent Token fee (digital human service of the Aibao Store Manager), and the maintenance fee.
It is worth noting that currently, only "Yingzhi XBOT" in the industry holds the "National Full - Category Food Business License Qualification", which enables the company to legally and compliantly control the supply chain, thus locking in the RaaS closed - loop.
In terms of the customer structure, "Yingzhi XBOT" has targeted non - coffee chain cross - industry customers, such as luxury stores, home furnishing stores, and 4S stores.
At the same time, "Yingzhi XBOT" also provides solutions for JD Seven - Fresh Coffee, Yizhuang Robot, FICO Robot (FICO Coffee), Jilv Holdings (Jilin Ice and Snow Culture and Tourism), and Bowutianxing (Cultural and Museum Venues), and jointly explores the overseas market with an Asian coffee brand with more than 4,000 stores.
As of now, more than 1,000 coffee robots of "Yingzhi XBOT" have been deployed in more than 100 cities around the world, producing more than 4 million cups of coffee. In 2025, the revenue of "Yingzhi XBOT" exceeded 100 million yuan. Tang Mu told Hard Kr that in 2026, the order value in hand of "Yingzhi XBOT" is close to 300 - 500 million yuan.
The rapid establishment of the technology, product, and business closed - loop within just four years of its establishment is due to the "Iron Army" of the core team of "Yingzhi XBOT".
Interestingly, the team configuration of "Yingzhi XBOT" is different from that of a typical robot company. It is more like a combination of a mature consumer goods company and an AI company.
According to the introduction, the robot technology segment of "Yingzhi XBOT" is led by Wang Jiali. As a doctor of mechanical and electrical engineering from Harbin Institute of Technology, he has served as a senior executive in enterprises such as Aerospace Science and Industry Corporation, Sany Group, and Siasun Robot & Automation Co., Ltd.
The operation segment of "Yingzhi XBOT" is composed of the "Luckin Coffee" team, including Cao Ruikun and Yu Tao, who were previously responsible for the chain operation of large - scale regional stores, the establishment of brand standardization systems, and chain training systems.
Here is an excerpt from the dialogue between Hard Kr and Tang Mu:
Hard Kr: What considerations led you to choose to enter the commercial scenario?
Tang Mu: The derivation process is very linear. First, I would rule out the toC household scenario. At that time, observing the industry trend, many teams were training robots to fold clothes and cook one after another, but they burned a lot of money without satisfactory results. The conclusion is that except for categories like sweeping robots with strictly limited uses, general household robots are not feasible in the short term.
Second, rule out the toB industrial scenario. In the dark - light factory, traditional robotic arms are already efficient and stable enough. There is simply no room for a clumsy and unpredictable humanoid robot to slow down the speed. The industrial scenario does not need a humanoid form but efficiency.
Third, target the toB commercial scenario. Since households and factories are not suitable, then look for the largest, most frequently - demanded and growing business in the commercial field. After deduction, coffee is the optimal solution. It is a globally popular and high - frequency beverage, with a high degree of standardization and is in a period of rapid growth. Therefore, our decision to make coffee robots is not a sudden inspiration but an inevitable result of logical deduction after eliminating the wrong answers.
Hard Kr: Since the beginning of this year, the valuation of humanoid robots has soared, but you insist on making non - humanoid vertical products. How do you view this form of choice?
Tang Mu: Elon Musk said that the humanoid form conforms to the first - principles, but in many workplaces, the humanoid form is actually a burden. Long legs themselves consume a lot of computing power and electricity, and standing up just to "look like a human" does not solve the actual needs. Our logic is that non - humanoid forms can do things that humans can't in many situations, which is actually more important.
Hard Kr: In terms of hardware selection, you insist on using mature supply - chain products such as six - axis industrial arms, grippers, and harmonic reducers. Why?
Tang Mu: The first principle in the commercial scenario is stability, not showing off skills. The five - finger dexterous hand is nominally able to open and close 100,000 times, but in a shopping mall, if it misses and spills the coffee, the day's business will be affected. So we only use grippers that have been verified in the industry. The same goes for joints. Planetary reducers are only suitable for "making rough movements", while harmonic reducers win in terms of repeated positioning accuracy, mass - production cost, and service life. Business does not accept the "approximate" in the laboratory but must pursue the "precise" with certainty.
Hard Kr: Now many companies are talking about "general embodied intelligence", but you limit the scenario to catering. Will this limit future imagination?
Tang Mu: Imagination does not depend on how broad the scenario is but on whether it can be penetrated. There was a time when people thought "VLA is dead" because they wanted it to fold clothes and cook at the same time, which is really difficult in the general domain. But in the vertical domain, when the tasks are limited, it will work well. Just like the robotic arms in the dark - light factory, they don't even need eyes. They just need to do a single task well at a low cost.
Most of the time, we don't even want to trigger the VLA and only call it in case of abnormal situations to save computing power. In the future, almost all products will be embodied, but not all of them will become robots. Therefore, the goal of "Yingzhi XBOT" is to penetrate the catering vertical scenario and become a SaaS company in the era of embodied intelligence through the flywheel of "operation - data - model".
Quelle des Titelbildes | Betriebsbeilage
Layout | Fan Xinya
Gerne Kontakt aufnehmen