首页文章详情

Dao Fang, der ehemalige Leiter des Live-Shopping-Geschäfts von Taobao, gründet erneut ein Unternehmen und will mit KI eine "cyberische Freundin" für die Verbraucher schaffen | Exklusivbericht von Intelligence Emergence

咏仪2025-08-08 12:27
Das Ästhetische kann in Elemente zerlegt werden. Dies ist eine neue Chance für KI.

This article is from the WeChat official account "Intelligent Emergence". Author: Deng Yongyi. It is published by 36Kr with authorization.

Text | Deng Yongyi

Editor | Su Jianxun

Every time you click into a live - streaming room, your choice is profoundly changing the business ecosystem.

Now, the live - streaming ecosystem has become relatively mature. The "influencers" in live - streaming rooms and on social media have actually become the storefronts of a brand. Their personas represent the target audience, style, and positioning. Influencers are responsible for capturing the latest fashion elements of the new season, quickly creating design drafts and samples. Domestic factories can complete production and delivery within half a month.

This model, known as the "small - order, fast - return" model, has been operating for many years, giving rise to cross - border giants like SHEIN and has become the standard in the domestic market.

However, in the vast overseas market, the e - commerce infrastructure varies greatly. The lack of logistics and production supply, as well as a more diverse user base, hide a large amount of space that can be transformed by AI.

This is what Dao Fang, who was once in charge of Taobao's live - streaming business, most wants to achieve after leaving Alibaba.

He was born in 1983 and has experienced all the trends in the mobile Internet. After graduating in 2008, he joined companies such as Baidu and 360, serving as an algorithm expert. In 2013, he joined Alibaba and stayed there for 11 years. He once built the Indian short - video platform VMate from scratch, which had hundreds of millions of users, was in charge of the "Dian Tao" APP, and began to oversee Taobao's live - streaming business in 2021.

"From the aesthetics of influencers, to the desks of designers, to the factory, and finally to the consumers, this process is too long, and the information is severely distorted," Dao Fang said.

In other markets where e - commerce is not as developed, he hopes to establish a complete AI e - commerce service system to bridge the domestic supply chain and the overseas influencer ecosystem.

The flow of goods follows the traffic, which directly reshapes the supply - chain ecosystem. For example, in the traditional clothing supply chain, the sales system may be divided into 5 to 9 levels. The commodity circulation cycle is long, and the markup rate is high. Brands need to design, sample, and place orders for all products half a year in advance, resulting in a relatively low accuracy rate in predicting best - selling items in traditional women's clothing and creating inventory problems.

On the consumer side, with the development of large - scale models, it has become possible for everyone to receive one - on - one service. This is the beginning of Dao Fang's new startup project, Infimate: to create a one - on - one exclusive "cyber bestie" for everyone.

This cyber bestie is like a dedicated AI shopping agent for you, a friend who can understand fashion trends and provide personalized clothing - matching suggestions. "It will make clothing matches for users based on their body shape, skin tone, and current fashion trends, and give professional advice," Dao Fang described to us.

Most importantly, the "cyber bestie" must be able to help you reduce many complicated trivial matters in the shopping process: calculate the best discounts, help you grab coupons, tell you various little money - saving tips, and monitor the best time to place an order, etc.

This is also the reason why Dao Fang thinks the timing for entrepreneurship is ripe after the wave of agents brought by Manus: the model can call internal tools to complete various tasks. And all of this must be based on sufficient in - depth vertical domain data and scenarios.

He cited trends such as the previously popular "Tiffany blue" and "avocado green" as examples. These trends were created by various KOLs and influencers and then spread to a wider range of consumers.

"What these KOLs have achieved is to 'coin words' in the fashion field, creating and spreading the current popular fashion elements," Dao Fang believes. By training with various public - domain data and the preferences actively provided by users, large - scale models are already capable enough to do what KOLs do: capture, predict, and express fashion elements, and provide more accurate recommendations to users.

At the same time, after the model establishes these tags, it can also provide this ability to the B - side, serving B - side brands, influencers, etc. The service scenarios are diverse, including best - selling item prediction, best - selling item generation, and product - selection decision - making support, ultimately building an AI e - commerce service system.

However, this inevitably involves a question: what Dao Fang wants to do seems to be within the reach of large - scale companies. Why is the opportunity in the hands of startups?

"First of all, we are targeting the overseas market. The e - commerce platforms overseas are more fragmented in terms of both ecosystem and player landscape, and the degree of platform monopoly is relatively low. This is an ecosystem that is more conducive to the entry of large - scale models - providing personalized customization for the C - side, and users will naturally trust a third - party agent more," Dao Fang said.

More importantly, Dao Fang once started a business from scratch within a large - scale company. He knows well that when making decisions, large - scale companies need to consider their existing businesses and also have resource - allocation considerations. For exploring newer business directions such as agents, startups still have a lot of time and space.

In fact, after leaving Alibaba, Dao Fang announced in his WeChat Moments that he would start a business in the field of embodied intelligence, aiming to "contribute his own bit to enabling robots to enter thousands of households as soon as possible."

But after he and his core startup partners explored for a while, they finally evaluated that the current technological exploration of embodied intelligence is still in its early stage. It will take at least 5 - 10 years for there to be a commercial leap, "it's still a bit far from consumers."

In order to get closer to consumers and the market, Dao Fang chose to postpone the exploration of the embodied intelligence direction and focus on AI + e - commerce.

After 11 years at Taobao, in charge of the booming live - streaming e - commerce business, it has left a deep mark on Dao Fang - he speaks extremely fast, and you can hardly find an opportunity to ask questions from his continuous train of thought.

Dao Fang is also different from the general impression of high - level executives in large - scale companies. In the past impression, high - level executives in large - scale companies may have sky - high budgets at their disposal and be able to handle various grand strategies with ease. But from Dao Fang's narration, we can feel a simple goal - to find scenarios close to users, quickly establish a business closed - loop, and support the team.

This is Dao Fang's first public statement after leaving Alibaba. The following is a dialogue between us and Dao Fang, edited and organized:

△ Cheng Daofang, founder of Infimate. Image source: Enterprise authorization

 

Thought about making household robots, but maybe it's too early

"Intelligent Emergence": After you left Alibaba, the first WeChat Moment you posted said that you were going to make embodied robots. Now you're going to do AI e - commerce. How did this idea evolve?

Dao Fang: Around July last year, when the first wave of GPT's impact hit within Alibaba, I started researching the impact of AI on embodied robots and the e - commerce field. I didn't finally decide to start a business until February this year.

Actually, after I left, I was still promoting these two directions simultaneously. After all, we have conducted in - depth research on both embodied robots and AI e - commerce for many years, and I myself majored in computer science.

"Intelligent Emergence": From what direction did you want to start at that time?

Dao Fang: Since July 2023, when I was still at Alibaba, I had been paying attention to the AI applications in both embodied intelligence and e - commerce. After starting my own business, after collecting all the information, we thought about starting from household robots.

At that time, we thought that the current bottleneck in embodied intelligence lies in the operation data of real scenarios. So, from a feasible path perspective, we should target vertical scenarios. In vertical scenarios, we can start with existing methods, start working, and obtain more feedback and data after entering the scenario, so as to iterate rapidly.

Our goal is still to make embodied robots, but this field is currently in the early stage of technological development. Especially the third step, fine - grained operation, still requires a long - term technological accumulation and breakthrough. We estimate that it will take at least 5 - 10 years to reach the level of practical household use.

"Intelligent Emergence": After the research, how do you view the development stage of this field?

Dao Fang: Embodied intelligence actually involves three technological directions.

The first technological direction is the multi - modal interaction of AI, which involves the interaction interface and the overall cognitive understanding. This is actually more software - related and develops in parallel with large - scale models. It is an extension of large - scale model APPs like Doubao. The multi - modal interaction of Doubao is mainly text and voice, but robots need to control expressions, eye contact, tone of voice, micro - movements, etc., in addition to language, requiring more diverse interaction and control.

The second technological direction is the locomotion ability based on reinforcement learning, which mainly solves bipedal walking and whole - body control. Activities like singing, dancing, and doing flips are all based on this part.

The most important is the third direction, end - to - end imitation learning, to perform hand operations. Hand operations can truly replace humans.

In terms of the current technological progress, the progress of the first two parts is relatively good, especially the locomotion ability based on reinforcement learning. All the flips and marathon running of robots we have seen are the results of the development of this technology. However, many of these demonstrated robots still need to be controlled by a remote control.

To achieve autonomous operation, the multi - modal interaction ability in the first step is required, including expressions, micro - movements, and voice, as well as the planning layer. The second step of reinforcement learning is also okay.

"Intelligent Emergence": Now, the first two steps should have been done quite well.

Dao Fang: The current difficulty that everyone faces is the third step, fine - grained operation. Because it involves contact - based fine movements, requiring millimeter - level precision, it needs tactile and force feedback information, and it needs to be general - purpose to enter households, so it is very difficult.

After my research at Alibaba, I concluded that the development path of embodied robots should be to target vertical scenarios and start with existing methods in vertical scenarios. Although the accuracy is not very good, by continuously providing data and dealing with various corner cases, it will get better and better.

The problem is that it is also very difficult to obtain corner cases. There is a serious lack of high - quality data at present. After my research, the generally recognized best solution is to build an end - to - end model, but this requires at least billions of data, and the bottleneck is still very obvious.

"Intelligent Emergence": Besides data, are there any other difficulties?

Dao Fang: Firstly, there are problems with precision and force control. Hand operations require millimeter - level precision and force feedback. For example, when you pick up an egg, if the force is too strong, the egg will break; if the force is too weak, the egg will drop. This requires very precise force control. Currently, it is very difficult for robot hands to achieve the same level of fine tactile feedback and force control as human hands.

Secondly, there is the challenge of generality. Unlike industrial robots that only need to perform a few fixed actions, household robots need to handle various different objects, environments, and tasks. They may need to fold clothes one day, wash dishes the next day, and organize the bookshelf the day after. The action space and requirements for each task are completely different.

Thirdly, there is the problem of data collection. Imitation learning requires a large amount of high - quality demonstration data, but it is very difficult and expensive to collect data on human fine - grained operations. Unlike image recognition, where data can be crawled from the Internet, data on hand operations need to be specially recorded, and the accuracy of the actions needs to be ensured.

There are too many corner cases. There are too many changes in the real environment. The materials, shapes, weights, and placement positions of objects are all different, and it is very difficult to cover all situations with limited training data.

So, although the first two steps of technology are relatively mature, the third step, fine - grained operation, is the key to determining whether robots can be put into practical use and is also the most difficult to overcome.

"Intelligent Emergence": Does this mean that the application scenarios are very limited?

Dao Fang: Currently, there is an obvious mismatch between technological capabilities and market demand. We want to create scenarios that are more "useful" than "interesting".

But the fine - grained operation functions that users really need, such as handling daily household chores and sorting items, are still far from mature in terms of technology. Currently, robots are mostly used for simple greeting interactions and performances, and are still far from being able to do real - world work.

Of course, this doesn't mean it has no value. We just need to face this reality and also understand why the consumer - grade robot market is progressing relatively slowly. Technological development takes time, and the market also needs to find that balance point.

"Intelligent Emergence": So, is it because the technological development curve is still in its early stage that you chose to consider other directions?

Dao Fang: In the third stage of embodied intelligence, the part where robots can truly plan and operate autonomously, the generally recognized end - goal is to build an end - to - end model. But the real problem now is that in the current end - to - end data - driven approach, the imitation learning methods we are using have a big problem of poor generality, and it is very difficult to create an experience beyond expectations.

What we want to do is still to make robots enter thousands of households and really be able to do work.

We expect that this direction needs to wait for the emergence of some general - purpose, "blockbuster products" for our team's strengths, such as product development and commercialization, to be better utilized.

Want to be an "AI bestie", not a cold - blooded recommendation system

"Intelligent Emergence": If the embodied intelligence field is still too early, what opportunities in AI + e - commerce did you see later?

Dao Fang: It's mainly because we saw the huge transformative opportunities brought by AI technology, especially in the e - commerce field. What we want to do is essentially to explore the possibility of new e - commerce entry points in the AI era, which requires a brand - new perspective and organizational form. I believe that the application of AI in the e - commerce field, especially in personalized recommendation and content generation, has great potential. Entrepreneurship will give me more freedom to explore these possibilities.

"Intelligent Emergence": Which key technological changes are the most core?

Dao Fang: Manus in early this year was actually a landmark event.

I would divide the AI applications in recent years into three stages:

  • The 1.0 era was mainly the dialogue - interaction stage, which solved some simple information needs and lasted for about 2 - 3 years;
  • The 2.0 era of AI vertical applications focused more on improving efficiency, such as text - to - video and image - to - video technologies applied in various vertical fields. Most of these applications are targeted at the enterprise side;
  • The 3.0 era is a very typical development stage. Some new models, such as GPT - 4, have incorporated the use of tools into model training. This means that the technology in the 3.0 era, such as applications represented by Manus, has not only entered the stage of information interaction but also started to enter the stage of operation.

What does this mean? It means that AI is no longer just providing you with information. It can be like an intern, helping you operate the computer and perform some mechanical but necessary tasks, such as collecting information and applying filters. This ability to use the computer is of great value.

"Intelligent Emergence": How would you summarize what you want to