HomeArticle

Former senior executives from Meituan, ByteDance, and Youdao embark on entrepreneurship, aiming to develop an "AI learning companion robot" | Exclusive from Intelligence Emergence

王方玉2025-08-28 09:00
After two years of entrepreneurship, Bao Ta surveyed hundreds of families and abandoned the form of AI plush toys that many people in the market are working on. He admits that he is very cautious, which stems from his awe of the hardware business.

Text by | Wang Fangyu

Edited by | Su Jianxun

Since the explosion of large models, there has been a continuous wave of startups in fields such as AI companionship, AI education, and AI toys. However, the market still lacks a successful Product-Market Fit (PMF) example regarding the appropriate hardware form and interaction mode.

The startup "Singularity Lingzhi", founded in July 2023, is trying to offer its own answer. The founding team of this company consists of three experienced Internet veterans:

Founder and CEO Bao Ta once served as the vice president of Meituan and the CTO of Meituan Finance. Even earlier, he was the general manager of NetEase Youdao Dictionary;

Hardware director Xu Yifei once served as the platform product director of ByteDance's "Dali Smart Desk Lamp";

Marketing director Hu Chen is also a member of the founding team of NetEase Youdao.

Bao Ta, founder and CEO of Singularity Lingzhi     Image source: Authorized by the enterprise

According to exclusive information obtained by "Intelligent Emergence", "Singularity Lingzhi" has recently completed an angel round of financing of tens of millions of yuan, led by Xinglian Capital (Z Fund). The company's first product, an AI English learning companion robot, is targeted at preschool children aged 3 - 8. Currently, the product has been developed and is in the internal testing stage.

When starting the AI hardware business in September 2023, Bao Ta had a clear and definite positioning for the product: to create an AI hardware product for preschool children to learn English.

His judgment is that currently, there is a large market space for educational hardware for young children in China, and the competition pattern is fragmented. The large AI models are bringing an "opportunity to upgrade from feature phones to smart phones." The reason for choosing English as the entry point is that English education is where parents of young children in China are most willing to spend money and make the largest investment.

However, when it came to formally defining the hardware product, he and his team faced difficulties: there are numerous combinations of forms, functions, and interaction methods for AI education and companionship hardware. How should they make choices and trade - offs?

His past work experience made Bao Ta extremely cautious about the above issues. At Meituan, he once suffered a major setback in hardware.

At that time, as the technology director of Meituan Finance, he participated in the famous offline merchant payment collection battle. However, due to the unexpected explosion of market demand, he encountered problems such as shortages of supply - chain goods and rising prices of key components. This was a significant "mistake" for a business with thin profit margins.

This experience of hitting a pitfall left a deep impression on Bao Ta and made him have more respect for hardware products: "Hardware is different from software. The cost of trial and error is high. Once the product is molded and finalized, you can't turn back."

Therefore, in order to find the appropriate product form, Bao Ta and his team spent a lot of time conducting user research and product testing, and they rejected several product plans one after another.

Bao Ta told us that shortly after the company was officially established, he led the team to test the form of current mainstream AI toys, that is, a plush toy plus a voice - dialogue box, and quickly conducted a round of testing by fully simulating AI with humans. Finally, they found that children quickly lost interest in this product.

Subsequent product ideas, including some prototype products, were verified within "Singularity Lingzhi" using this method, but these forms were ultimately abandoned one by one.

In order to find the appropriate product form, Bao Ta and his team also visited hundreds of families in first - and second - tier cities in China for research in the past year or so. They sent the internal testing products to users' homes and let users use them for several weeks. Finally, they judged whether to keep or remove product functions based on indicators such as word - of - mouth and retention rate, so as to iterate the product.

The team finally concluded that three core conditions are essential to maintain the long - term interest of young children in AI education products.

First, there should be multiple modalities including voice, vision, and touch to provide rich interactive experiences;

Second, a content system needs to be constructed to input and guide children with content;

Third, there should be a hardware image that provides emotional value and conforms to children's aesthetics and preferences.

"Currently, there are already products in the market that combine one or two of these aspects, but I think only when all three points are met can we better retain long - term child users." Bao Ta told "Intelligent Emergence".

Based on this, nearly two years after the company was founded, "Singularity Lingzhi" finally officially determined the final form of its first product - a desktop robot with a screen, a cute physical image, and AI multi - modal interaction capabilities.

Regarding the sales volume of this product, Bao Ta also did not set a preset goal. He believes that as long as users truly recognize the product and there are real user data and feedback, it is enough to support the company to go further and polish the product to maturity and perfection.

"This is no longer an era where you can get financing with just a PPT. Instead, we first spend time to make the product, and then show investors the real robot and real user feedback, so that they can participate more actively and confidently." Bao Ta said.

This is Bao Ta's first public statement since leaving Meituan. The following is a dialogue between "Intelligent Emergence" and Bao Ta, edited and organized:

There is an opportunity to upgrade educational hardware for young children from feature phones to smart phones

"Intelligent Emergence": Your previous resume includes NetEase Youdao Dictionary and Meituan Finance. Why did you think of doing AI education?

Bao Ta: I graduated from the Department of Computer Science at Tsinghua University, majoring in artificial intelligence. After graduation, I first worked at Microsoft Research Asia, and then founded Youdao with my classmates.

Later, during my time at Meituan, I also used big data and AI to solve problems such as marketing efficiency and accurate user identification in the financial field. My studies and career have always been highly related to AI.

After the explosion of large AI models, I think this is a major opportunity more similar to an industrial revolution than search engines and the mobile Internet. I must seize it, so I started my business in September 2023. Combining my understanding of the industry and the market, I think the combination of AI applications and the education field with large AI models will have great potential.

"Intelligent Emergence": What opportunities did you see in AI educational hardware?

Bao Ta: We found that there is no particularly suitable educational hardware for children, especially young children.

Currently, children use electronic products such as tablets more often. Parents generally worry about children's addiction and the impact on their eyesight, so they usually limit the time their children use these devices.

A significant proportion of children's devices are single - function products, such as storytellers, point - reading pens, and thinking machines. They each have their suitable scenarios, but they also face challenges in personalized learning and in - depth interactive learning.

After the emergence of large AI models, we have the opportunity to better attract children's attention and integrate learning content while meeting their interests. So I think there is a market gap in educational hardware for young children, just like the opportunity to upgrade from feature phones to smart phones.

In terms of market size, the interest education field for young children, although not as rigidly demanded as the exam - oriented education in the K12 stage, also has a considerable scale, reaching tens of billions of yuan annually. There are many types of products in this field, and the competition pattern is fragmented, unlike the exam - oriented education market, which has been divided up by several giants.

"Intelligent Emergence": What role does the large AI model play in filling this gap?

Bao Ta: Since the beginning of our business, we have been particularly concerned about the interaction upgrade that AI multi - modal capabilities can bring to educational hardware. If the hardware product can "see" children, recognize their actions, the objects in their hands, and the environment they are in, and actively interact based on this, such interaction will be richer and more vibrant. And the input method is not limited to screen touch, which greatly reduces the interaction threshold.

For example, if there is a robot on the dinner table at home and it sees a child eating pizza, the robot will actively start a conversation: What are you eating? This kind of interaction is very similar to that of a real foreign teacher. This is the result we pursue after combining large - model technology with multi - modal interaction.

"Intelligent Emergence": Your product is called an AI learning companion robot, highlighting the companionship function. How do you understand the concept of companionship?

Bao Ta: The concept of companionship is relatively vague for users, and different people have different understandings of it.

For young children, what a good companionship product is, we think companionship has at least three different values. One is to play with them, the second is to study with them, and the third is to accompany them in daily life, including some emotional chat interactions, which all count as accompanying in life. Different companies focus on different aspects of these three value points.

Our requirement for ourselves is to achieve two points. First, make children like to play with it, which is an important starting point; second, make parents recognize its value and be willing to pay for it. We made many trade - offs in the general direction of companionship and finally focused on companionship mainly based on English education value, gradually turning it into a very clear entry point.

"Intelligent Emergence": Why choose English education as the entry point instead of encyclopedias, storytelling, or interest exploration?

Bao Ta: It is completely based on user value.

From our observation, parents of young children in China are willing to spend money on education, and the field where they make the largest investment is English education, whether it is taking foreign - teacher classes, signing up for various courses, or buying some intelligent hardware related to English learning. We believe that there is a large market space for it, and it is a rigid demand.

The abbreviation "large model" in AI actually refers to the Large Language Model (LLM) in the original text. So it naturally has strong language abilities, better than the English level of most parents in China, and it is very well - suited for English education, which has been reflected in some products, such as the foreign - language teacher dialogue function of Duolingo used by adults.

"Intelligent Emergence": Is there a contradiction between companionship and learning? Does the idea of "combining education with entertainment" hold?

Bao Ta: We think it holds. For example, the Little Genius learning machine was a product that combined education with entertainment for our generation when we were children, and it sold very well. It was the best educational device for children to access information in the past. There is still an opportunity to combine education with entertainment in the current AGI era.

Three core conditions are needed to capture the interest of young children

"Intelligent Emergence": Has your first product come out? What is its form?

Bao Ta: Our prototype has come out. Currently, it is in the intensive internal testing stage and has not yet been unveiled to the public.

Its general product form is an AI hardware with a screen, a cute cartoon robot image, and multi - modal interaction capabilities. We call it an AI learning companion robot. Its core function is to accompany learning, so we did not design self - moving functions like those of robotic dogs in the product.

"Intelligent Emergence": Your company was founded in July 2023. Have you been polishing the product during this period?

Bao Ta: In fact, it has been exactly two years from the time the team was formed until now. Our founding team has experience in developing large - scale products, and we have high requirements for PMF. At the same time, this is a product with complex forms and functions, and we need to find a new PMF.

Our past entrepreneurial experience makes us clearly understand that this is a test that the team must pass internally first, and we cannot let consumers pay for an idea.

So before the product came out, we spent a lot of time polishing and repeatedly demonstrating it. The observations and understandings mentioned above all come from a large amount of user research, product testing, and product exploration and iteration we did before.

We probably converged on the direction of a companionship robot mainly based on English education value at the end of 2024. It took us more than half a year to make the first - version prototype, and then we started the stage of user internal testing, user feedback, and co - creation.

There is also a reason related to the development stage of industry technology. If we were to make a chatty smart speaker, there might have been many homogeneous products in the market last year. But the complete form we understand requires better multi - modal technology. It was not until the launch of Gemini 2.0 at the beginning of this year that the multi - modal capabilities of the entire industry took a step forward. Only then was there a relatively good guarantee for the product experience.

"Intelligent Emergence": Your product has a complex positioning and functions. Why is it positioned and shaped like this?

Bao Ta: Yes, our product is very different from the existing ones in the market. Users who have used our product may feel it is very unique. This difference essentially stems from our understanding of user needs.

Our target users are young children aged 3 - 8. Without the pressure of exams, they are particularly interest - driven and like to play. The most difficult part of making such products is to capture their continuous interest. To achieve this, we believe there are three core conditions:

First, there should be multiple modalities, including vision and action operations, to provide richer interactive experiences. Currently, the high return rate of some AI toys is partly because single - voice interaction can easily make children lose interest. In the human sensory system, 70% relies on vision, far more than hearing, smell, and touch.

Second, young children need content input and guidance. Children have limited knowledge of the world and have difficulty finding topics on their own. So a large amount of content is needed for guidance. Therefore, we need to build a content system that can continuously attract children's interest and gain parents' recognition.

Third, there should be a hardware image that provides emotional value. Children at this age are used to interacting with physical toys. They have their own preferences and emotional connections. A hardware with a physical image can establish a partnership with children and provide emotional value, which is better than an invisible and intangible image on an electronic screen. A child can take a teddy bear to bed, but cannot hug a tablet to sleep.

We think all three points are very important. I noticed that there are already products in the market that target one or two of these aspects, but I think only when all three points are met can we better retain long - term child users.

The product positioning and functions of Singularity Lingzhi are complex     Image source: Authorized by the enterprise

"Intelligent Emergence": Will you also create the content yourself?

Bao Ta: Yes. In the initial stage of the product, we mainly use self - owned content to facilitate cold start. In the future, we will gradually expand to be compatible with more non - self - owned content.

In building the content system, we use a large amount of AI - generated content and large - model technology. We use large AI models to integrate some scenarios that children are most familiar with and like with the English teaching content we designed. We are mainly driven by children's interests while also taking into account the supplement of knowledge