HomeArticle

Dialogue with Yang Fengyu, CEO of Ureach: Post-00s entrepreneurs don't bet on VLA. Instead, they send robots to work in hotels first.

富充2025-08-28 15:12
First, enter the scenarios where the technology can be implemented and deploy large - scale applications. Then, during actual operations, continuously iterate through the data flywheel. This is the case for both the technological strategy and the business strategy of Uliqi.

Text by | Fu Chong

Edited by | Su Jianxun

“Since the end of the Robot Games, our company's 400 consultation hotline has been ringing non - stop. In the second week after the competition, more than a dozen hotel clients came to our company for a visit,” Yang Fengyu, the founder and CEO of Unix AI, told Intelligent Emergence.

At the World Humanoid Robot Games in early August, Unix AI's robots won two gold medals and one silver medal in the hotel cleaning and greeting service events. In the hotel...

This has attracted the attention of hotel operators, nursing home operators and other business parties to Unix AI.

Both of these events test the robot's generalization ability, fine hand - manipulation skills and movement speed. In the cleaning event, the robot is required to pick up various items scattered in the room in the shortest time. In the greeting event, the robot needs to grab the “guest's” suitcase and quickly transport it to the finish line.

The reason why the robots were able to win medals in the competition is that they had already entered “quasi - consumer - end” cleaning scenarios, including hotels, as the application scenarios for the robots. They accumulated data and operational capabilities in the scenarios while working.

In hotel rooms, Unix AI's robots perform tasks such as cleaning, tidying up and collecting garbage. Although their speed is still slower than that of human cleaners, the cleaning scenario has a relatively high tolerance for errors, allowing the robots to work slowly behind closed doors.

In Yang Fengyu's view, the skills honed in “quasi - consumer - end” scenarios can be transferred to B - end and C - end tasks in families, restaurants, fast - food joints, cafes and other places in the future.

Currently, Unix AI's robots have entered the small - scale delivery stage and have signed orders with several hotel groups, property management companies and nursing communities.

Given the current industry situation of a lack of data, Unix AI did not choose the currently more mainstream VLA (Vision - Language - Action) end - to - end approach.

Unix AI's technical approach is to decompose the actions required in the scenario into key points and movement trajectories and conduct imitation learning.

In this way, with a small amount of action data, the robot can learn the trajectory of that type of action. After that, it first enters the scenarios where it can be applied in large quantities and then continuously iterates through the data flywheel while actually working.

This is also Unix AI's business strategy.

The second - generation and third - generation Unix AI Wanda robots receive awards at the Robot Games. Photo provided by the interviewee.

Yang Fengyu was born in 2000. He graduated from the University of Michigan with a bachelor's degree in computer science and is pursuing a Ph.D. in computer science at Yale University. In 2024, Yang Fengyu suspended his Ph.D. studies and founded Unix AI.

In his view, in the past 20 years, in all hardware - related fields, Chinese companies have ultimately dominated. This is also the reason why he seized the current opportunity in embodied intelligence and returned to China to start a business.

Recently, Intelligent Emergence interviewed Yang Fengyu to further discuss Unix AI's views on commercialization, technology and other aspects. He also revealed the situation of the yet - to - be - launched third - generation Wanda robot to Intelligent Emergence.

The following content is from the interview and has been organized by the author.

Yang Fengyu, the founder and CEO of Unix AI. Photo provided by the interviewee.

Exploring the “Data Flywheel” in “Quasi - Consumer - End” Scenarios

Intelligent Emergence: Unix AI's robots won two gold medals and one silver medal at the Robot Games. What subsequent impacts did this have on the company?

Yang Fengyu: As soon as the competition ended, our 400 consultation hotline was flooded with calls. In the second week after the competition, more than a dozen hotel clients came to our company in a group for a visit.

Although the work - related competitions did not attract much on - site traffic and we didn't even get a spot on the big screen, the results still had an impact among clients.

At the same time, we also improved the robots' capabilities during the preparation for the competition.

For example, in the hotel greeting event, the original rule was that the robot should pick up the suitcase, put it on the luggage cart and then push the luggage cart to the designated place. The difficulty of this action lies in that the robot's forward direction and the direction of dragging the luggage cart may not be in a straight line, which involves many hardware problems to be solved.

For this reason, we spent more than a month iterating on the hardware. Although the task of pulling the luggage cart was later cancelled, I'm very grateful for it as it improved our robots.

The third - generation Wanda robot uses both hands to pull a suitcase in the hotel greeting event. Photo provided by the interviewee.

Intelligent Emergence: You mentioned that Unix AI's robots have entered the hotel scenario to collect data while working. Why focus on this scenario?

Yang Fengyu: Hotel cleaning is considered by us as a “quasi - consumer - end” skill. Mastering atomic actions such as cleaning, tidying up and collecting garbage in this scenario can be transferred to scenarios in families, nursing homes, restaurants, fast - food joints and cafes.

The data in the hotel cleaning scenario can also be uploaded back. Unlike in industrial scenarios, the data confidentiality requirement is not as high, which is very helpful for training the robot model.

The advantages of the hotel cleaning scenario also include a high tolerance for errors, allowing the robots to work slowly behind closed doors and having fewer potential risks in human - robot interaction.

Intelligent Emergence: So, winning the competition was more due to usual accumulation?

Yang Fengyu: Yes. The hotel cleaning competition required the robots to enter a simulated scenario and pick up scattered bottles, boxes and other items, which is something that Unix AI's robots are already good at.

In fact, our robots can perform tasks that are more difficult than those in the competition, such as collecting garbage, packing garbage bags, making beds and cleaning bathrooms.

Intelligent Emergence: Unix AI's technical approach is to collect data in actual work while mass - producing and delivering products. What's the reason for doing this?

Yang Fengyu: Unix AI is a robot company following the Tesla model, which means deploying a sufficient number of robots in actual scenarios first and then accumulating enough data through the “data flywheel.”

The advantage of this approach is that the training threshold is very low. We don't even need algorithm engineers, and deployment engineers can handle it.

I believe that the Scaling Law, which states that a quantitative change in data leads to a qualitative change in large - language models, can also be replicated in the field of embodied intelligence. However, the method of scaling is very important.

First of all, the quality and diversity of data are very important, and diversity is even more important. I'd rather have 100 million pieces of data that conform to the “natural distribution” than a “small batch” of data in an artificial distribution. To collect data as much as possible in the natural distribution, it's impossible to hire people to collect data every day. We can only collect data in real - world scenarios.

Secondly, the quantity of data needs to be large enough. In the fields of images and text, to train a multi - modal large - language model, data needs to be accumulated in the order of billions.

In the field of autonomous driving, which is most similar to embodied intelligence, to run an L4 or near - L4 level model, at least hundreds of thousands of vehicles need to be on the road. This is under the ideal condition that all data is clean.

In the field of robotics, I think at least a similar order of magnitude is required. Without hundreds of thousands of robots in operation, it's impossible to develop a very good model.

To reduce the competition time, the second - generation Wanda robot can use both hands to work in the hotel cleaning event. Photo provided by the interviewee.

Not Betting on VLA, but Insisting on Full - Stack In - House Development

Intelligent Emergence: I heard that you encountered a problem with the “closing the door” action in the competition but solved it quickly. What was the reason for the quick problem - solving?

Yang Fengyu: Closing the door is inherently difficult for robots. This hinge action requires considering the sideways movement, the coordination of the whole - body movement angle and the grasping of the doorknob simultaneously.

On the night of the opening ceremony, when we went to the site for a simulation, we found that the doors in the hotel cleaning event were one meter wide.

This size was designed to facilitate the entry and exit of some robots with large chassis, but it is wider than the doors used in regular hotels and homes. Our robots have a smaller chassis, and the models and algorithms were usually trained for regular hotel doors that are 75 - 80 centimeters wide. So our dual - robotic - arm door - closing strategy was not suitable for the doors in the competition.

That night, we used VR equipment on - site to collect data again and retrain this atomic skill. The next morning, we were the first team to compete, and there was no chance for a second adjustment.

Fortunately, we won the competition without much danger. Our self - developed imitation - learning platform, UniFlex, played a big role. Its greatest advantage is its extremely high data - utilization efficiency. It can learn a new task with only 5 to 10 times of data collection.

Intelligent Emergence: Could you introduce UniFlex in detail?

Yang Fengyu: This is a perception - operation decoupling model. Its core is imitation learning based on key points.

We decompose an action into several key points and movement trajectories and learn in the topological space.

This is a school of thought related to the main schools of robot motion generation, DMP (Dynamic Movement Primitives) and VMP (Variational Movement Primitives). Although there has been less mention of them in recent years, they have had a “second spring” after being combined with large models.

So you can understand it as that through a small number of actions, the robot can learn the trajectory of that type of action. For example, when performing the door - opening action, even if the door is different or the navigation is off by two centimeters to the left or three centimeters to the right, the robot can still complete the action.

(Author's note: As a mathematical concept, “topology” focuses more on the relative relationships between objects than on precise distances and shapes. For the door - opening action, the topological relationship is the relative position relationship between the “hand” and the “doorknob.” As long as the core relationship of “grasping” remains unchanged, the robot can recognize the doorknob and complete the key point of “grasping” regardless of its color, shape or material.)

Intelligent Emergence: So, in terms of core technology, what's Unix AI's technical approach? It seems that you don't bet on VLA like other companies?

Yang Fengyu: In the short term, we don't use VLA in our application scenarios.

In the long run, I'm optimistic about the VLA technical approach. However, given the current lack of massive amounts of robot data, the time for end - to - end VLA is not yet ripe.

Intelligent Emergence: Some teams have added Tactile (tactile sense) to VLA to form VTLA. What's your view on it?

Yang Fengyu: Tactile sense is very important. Our UniTouch is a large - model system based on the fusion of vision and tactile sense, which is used to improve the robot's understanding of materials and contact feedback, making the operation more similar to the way humans handle things.

However, we don't use the VTLA approach. The reason is that vision and tactile sense are two complex sources of perception. In practice, many teams working on VTLA adopt an almost “black - box” end - to - end model.

They encode multi - modal information such as tactile and visual information into a complex latent vector at the bottom layer of the model and then directly feed this vector as input to the downstream action decoder or body - control module.

However, the core problem with this approach is its lack of interpretability. For example, it's a bit like alchemy. You throw in all the necessary elements, but there is a lack of transparency in how the tactile and visual information are fused.

Our UniTouch mainly combines tactile sense as a multi - modal key point directly with our UniFlex imitation - learning framework. In the pre - training stage, we first establish the data relationship between vision and tactile sense through a pre - trained model, enabling the robot to imagine the “contact” feeling when it “sees” an object, and then judge the grasping force and method accordingly.

Intelligent Emergence: So, at present, do Unix AI's robots use visual - tactile sensors?

Yang Fengyu: Currently, Unix AI's robots do not have high - precision visual - tactile sensors installed.

This is because in the field of tactile sensors, the industry has not found a good solution that can make the sensors have all three characteristics of high signal density, durability and low price.

My model would definitely achieve good results if used with visual - tactile sensors. However, the cost is that each finger would require an additional 6000 - 8000 yuan for a visual - tactile sensor. Moreover, these sensors are not durable and would increase the thickness of the gripper. Currently, the cost - effectiveness of visual - tactile sensors is not high.

The second - generation Wanda robot with full - stack self - development. Photo provided by the interviewee.

Intelligent Emergence: How important is hardware to you?

Yang Fengyu: This year is the first year of mass - production for robots. I believe that hardware stability is of overriding importance.

Intelligent Emergence: Why do you insist on full - stack self - development of hardware? What are the difficulties?

Yang Fengyu: Firstly, there are no standard products for robots at present. It takes a long time to develop products with upstream suppliers. Developing on our own gives us better control over time. Moreover,