HomeArticle

Hard Krypton Exclusive | Former Xiaomi Executive Tang Mu Launches Coffee Robot Startup, Secures Hundreds of Millions in Financing with Investments from Lin Bin and Li Wanqiang

邱晓芬2026-07-05 14:23
Last year's revenue has exceeded 100 million yuan.

Author | Qiu Xiaofen

Editor | Yuan Silai

Yingke learned that the general catering embodied robot company "Yingzhi XBOT" has successfully completed two rounds of financing worth hundreds of millions of yuan in a row. Among them, the Series A financing of 200 million yuan was invested by GPTX, a capital firm from Hong Kong, and the Series B financing is 300 - 500 million yuan, jointly invested by multiple government funds, US - dollar funds, and industrial investors.

This is one of the largest - scale financings in the current catering vertical robot field.

Before this, "Yingzhi XBOT" also completed a round of angel financing, with a luxurious lineup of investors, including Zhang Xiaolong, the senior vice - president of Tencent; Li Wanqiang, a co - founder of Xiaomi; Huang Jiangji; Lin Bin; Hong Feng; and Guo Yike, the chief vice - president of the Hong Kong University of Science and Technology.

"Yingzhi XBOT" was founded in 2022. Its founder, Tang Mu, can be regarded as an "outsider" in the robot circle. He is a rare product manager - turned - CEO in the robot industry.

(Image source/Enterprise) 

Previously, he served as the general manager of Kingsoft Software and Tencent CDC, responsible for the experience design of products such as QQ and QZone. Later, he also served as the vice - president of the Xiaomi Ecosystem Chain, leading the launch of benchmark best - selling products like the Xiaomi router and the Xiaomi Smart Speaker, with a cumulative product - making experience of 25 years.

However, while the entire embodied intelligence track is going crazy for the narrative of humanoid robots entering households and factories, "Yingzhi XBOT" has chosen a path that seems less "cool" but can achieve rapid productization and commercial implementation - allowing robots to make coffee in the corners of shopping malls.

Different from most mainstream humanoid robot companies, the technical architecture of "Yingzhi XBOT" is designed around catering vertical scenarios, with the "XOS 3.0 Embodied Operating System" at its core, adopting a one - brain - multiple - forms architecture.

According to Tang Mu, the system is divided into three layers:

Brain layer: Responsible for high - level cognition and task planning, it is equipped with the Zhiwei Catering Large Model, trained based on the DeepSeek base, integrating the real production data of 4 million cups of coffee.

Cerebellum layer: Responsible for converting semantic instructions into joint - level control signals, with a response time of less than 10 milliseconds, and has accumulated more than 50 atomic skills for catering actions.

Ontology adaptation layer: It realizes cross - form reuse, supporting "one - time R & D, reuse in multiple forms", and can migrate the control logic of the coffee arm to other robots of "Yingzhi XBOT", such as ice - cream and cocktail - making robots.

(Image source/Enterprise) 

This architecture helps robots avoid the current industry pain point of the difficult generalization of the VLA (Vision - Language - Action) model.

Tang Mu pointed out to Yingke that the industry generally believes that the VLA has poor generalization ability because it tries to make a single model complete completely different tasks such as folding clothes and cooking at the same time. The strategy of "Yingzhi XBOT" is to limit the scenario and only call the VLA model as a backup in case of abnormal situations, thus saving computing power and ensuring stability.

In addition, "Yingzhi XBOT" has also built a complete data flywheel.

This system conducts simulation pre - training in the laboratory and online reinforcement learning in real - world commercial scenarios, continuously optimizing based on the real operation data of more than 4 million cups. At the same time, through RLHF (Reinforcement Learning from Human Feedback), it aligns with more than 3 million times of human preference data to continuously improve the model's performance.

After building the infrastructure base for the model and data, "Yingzhi XBOT" recently released four product lines to meet different levels of catering embodied needs.

XBOT C3 Coffee Robot: It covers an area of 1.83 square meters, uses a 6 - axis robotic arm with a repeated positioning accuracy of ±0.02 mm. It can produce 80 cups of coffee per hour, and can continuously make 150 cups when fully charged. It is equipped with a 43 - inch digital human screen and the "Aibao Store Manager" Agent, and is priced at 219,000 yuan.

XBOT I3 Ice - cream Robot: It has a smaller volume, can produce more than 60 cups per hour, with a production failure rate of less than five - thousandths. It has a designed service life of 250,000 cups, supports the addition of 4 kinds of sauces and 4 kinds of crispy grains, and is priced at 179,000 yuan.

XBOT X1 General Catering Humanoid Robot: It is equipped with a 7 - axis dual - arm, with a dual - arm cooperation accuracy of ≤±1 mm. It is equipped with a single Huixi R1 chip, providing 500 TOPS of local computing power. It is planned to be mass - produced by the end of 2026, capable of achieving a full - link closed - loop of picking, making, placing, and delivering, and is expected to be launched by the end of 2026.

XBOT CUBE Robot Food Truck: It is equipped with a 20 - degree battery, covers an area of 8 square meters, supports functions such as coffee, ice - cream, cocktail - making, and roasted sausage. The launch time is to be determined.

Tang Mu judged to Yingke that since humanoid and semi - humanoid forms are not the optimal solutions for all scenarios, in the future, dedicated single - arm models and general semi - humanoid models will develop in parallel.

To ensure future production capacity, currently, the annual production capacity of the bases of "Yingzhi XBOT" in Nanjing, Yueyang, Shanghai, and Yizhuang, Beijing, reaches 20,000 units.

However, technology must serve the commercial closed - loop. Tang Mu used the analogy of "driving for Didi" to describe the business model of "Yingzhi XBOT".

According to his introduction, taking the Lite series coffee robot deployed in a shopping mall in Yiwu, Zhejiang as an example, the equipment is sold for more than 100,000 yuan, with an average daily output of about 200 cups, an average customer price of about 20 yuan, and a monthly cumulative revenue of more than 60,000 yuan, with a net profit of more than 30,000 yuan. Calculated, the pay - back period is only 6 to 8 months. The designed service life of the machine is 5 years, and after deducting the pay - back period, the remaining time is basically the pure - profit period.

(Image source/Enterprise) 

Tang Mu used Didi as an analogy: Didi drivers spend more than 100,000 yuan to buy a car and need to be tied to the vehicle all day, while a robot of the same price only needs to be refilled and maintained once a day. From a business logic perspective, the coffee robot is essentially a more efficient means of production.

Based on this, XBOT does not adopt the common leasing model in the industry. Tang Mu believes that leasing is not suitable for the era of robots. Therefore, "Yingzhi XBOT" promotes the RaaS model (Robot as a Service).

According to Tang Mu, in the future, after franchisees pay the machine cost, they also need to pay three types of fees monthly, including the material subscription fee (for coffee beans, dairy products, etc.), the Agent Token fee (for the digital human service of the Aibao Store Manager), and the maintenance fee.

It is worth noting that currently, only "Yingzhi XBOT" in the industry holds the "National Full - category Food Business License Qualification", which enables the company to legally control the supply chain and thus lock in the RaaS closed - loop.

In terms of the customer structure, "Yingzhi XBOT" has targeted non - coffee - chain cross - industry customers, such as luxury stores, home furnishing stores, and 4S stores.

At the same time, "Yingzhi XBOT" also provides solutions for JD 7Fresh Coffee, Yizhuang Robot, FICO Robot (FICO Coffee), Jilv Holdings (Jilin Ice and Snow Tourism), and Bowu Tianxing (Museum Venues), and jointly explores the overseas market with an Asian coffee brand with more than 4,000 stores.

As of now, more than 1,000 coffee robots of "Yingzhi XBOT" have been deployed in more than 100 cities around the world, producing more than 4 million cups of coffee. In 2025, the revenue of "Yingzhi XBOT" exceeded 100 million yuan. Tang Mu told Yingke that in 2026, the order amount in hand of "Yingzhi XBOT" is close to 300 - 500 million yuan.

The rapid establishment of the technology, product, and business closed - loop within just four years of its establishment is due to the "iron - clad" core team of "Yingzhi XBOT".

Interestingly, the team configuration of "Yingzhi XBOT" is different from that of a typical robot company. It is more like a combination of a mature consumer goods company and an AI company.

According to the introduction, the robot technology sector of "Yingzhi XBOT" is led by Wang Jiali. As a doctor of mechanical and electrical engineering from Harbin Institute of Technology, he has served as a senior executive in enterprises such as Aerospace Science and Industry, Sany Group, and Siasun Robot & Automation Co., Ltd.

The operation sector of "Yingzhi XBOT" is composed of a "Luckin Coffee - affiliated" team, including Cao Ruikun and Yu Tao, who have been responsible for the chain operation of large - scale regional stores, the establishment of brand standardization systems, and chain training systems.

The following is an excerpt from the dialogue between Yingke and Tang Mu:

Yingke: What considerations led to the choice of entering the commercial scenario?

Tang Mu: The derivation process is very straightforward. First, I would rule out the toC household scenario. At that time, observing the industry trend, many teams were training robots to fold clothes and cook one after another, but they burned a lot of money with unsatisfactory results. The conclusion is that except for categories like the sweeping robot with strictly defined uses, general household robots are not feasible in the short term.

Second, rule out the toB industrial scenario. In the dark - factory environment, traditional robotic arms are already efficient and stable enough. There is simply no room for a clumsy and unpredictable humanoid robot to slow down the speed. The industrial scenario does not need a humanoid form but efficiency.

Third, target the toB commercial scenario. Since households and factories are not suitable, then look for the largest, highest - frequency, and growing business in the commercial sector. After deduction, coffee is the optimal solution. It is a globally popular high - frequency beverage with a high degree of standardization and is in a period of rapid growth. Therefore, our decision to make coffee robots is not a random inspiration but an inevitable result of logical deduction after eliminating the wrong answers.

Yingke: Since this year, the valuation of humanoid robots has soared, but you have insisted on making non - humanoid vertical products. How do you view this form selection?

Tang Mu: Elon Musk said that the humanoid form conforms to the first - principles, but in many workplaces, the humanoid form is actually a burden. Long legs themselves consume a large amount of computing power and electricity, and standing up just to "look like a human" does not solve the actual needs. Our logic is that non - humanoid forms can complete tasks that humans cannot do in many occasions, which is actually more important.

Yingke: In terms of hardware selection, you have insisted on using mature supply - chain products such as 6 - axis industrial arms, grippers, and harmonic reducers. Why?

Tang Mu: The first principle in the commercial scenario is stability, not showing off skills. The five - finger dexterous hand is nominally capable of opening and closing 100,000 times, but in a shopping mall, if it misses and spills the coffee, the day's business will be affected. So we only use grippers that have been industrially verified. The same goes for joints. Planetary reducers are only suitable for "making rough movements", while harmonic reducers win in terms of repeated positioning accuracy, mass - production cost, and service life. Business does not accept the "approximate" in the laboratory but must pursue the "precise" with certainty.

Yingke: Now many companies are talking about "general embodied intelligence", but you have limited the scenario to catering. Will this limit future imagination?

Tang Mu: Imagination does not depend on how wide the scenario is but on whether it can be penetrated. There was a time when people thought that "VLA is dead" because they wanted it to both fold clothes and cook, which is indeed difficult in the general domain. But in the vertical domain, when the tasks are limited, it will work well. Just like the robotic arms in the dark factory, they don't even need eyes; they just need to do a single task well at low cost.

Most of the time, we don't even want to trigger the VLA and only call it in case of abnormal situations to save computing power. In the future, almost all products will be embodied, but not all of them will become robots. Therefore, the goal of "Yingzhi XBOT" is to penetrate the catering vertical scenario and become a SaaS company in the era of embodied intelligence through the flywheel of "operation - data - model".

Image source on the homepage | Provided by the enterprise

Typesetting | Fan Xinya

Welcome to communicate