The AI audio bet of the former co-founder of Baichuan Intelligence: I'm going to create "people", AI anchors.
Text by | Zhou Xinyu
Interview compilation by | Zhong Chudi
Editor | Su Jianxun
Released in 2013, "Her" is Jiao Ke's favorite movie.
In the film, the AI Samantha has no face or image. All people can perceive is her gentle and calm voice. When Samantha says, "You've been through so much lately. You've lost a part of yourself," the male protagonist bursts into tears.
This scene greatly touched Jiao Ke: "Just a voice can create such a strong emotional connection."
Later, in early 2025, the time set in "Her," Jiao Ke, a co-founder of Baichuan Intelligence, chose to leave and start his own business, founding an AI audio company called Laifu Radio.
△ Jiao Ke, former co-founder of Baichuan Intelligence and founder & CEO of Laifu Radio. Photo source: Provided by the interviewee
At the time of his entrepreneurship, the audio industry was a controversial field. Google's knowledge base NotebookLM, released in July 2023, can generate 10 - 20 minute audio from users' research materials. This product has opened up imagination for the AI podcast industry.
However, on the other side of the imagination, the domestic audio industry has so far achieved little. The leading podcast platform, Xiaozhouyu, had only about 6 million monthly active users in early 2024, far less than long - video platforms.
During the financing process, Jiao Ke also faced many doubts: the efficiency of audio in transmitting information is far lower than that of video, and the market ceiling of audio is not high.
In our conversation, when answering the same question, Jiao Ke spent more than 30 minutes, chatting from "Her" and "2001: A Space Odyssey" to Xiaozhouyu and Doubao. For him, there are too many reasons to focus on audio:
Due to high production costs, the supply of domestic audio content is too low, while users have a large amount of "ear time" every day;
Currently, the supply of high - quality audio content cannot meet the personalized audio needs of different users.
More importantly, compared with video and text, audio is the most natural way of human interaction and has a strong companion attribute.
He told us that it is AI that can bring out the greatest advantages of audio.
On one hand, speech understanding and generation technologies can solve the supply problem and also establish an emotional connection with users during interaction;
On the other hand, AI starts to understand and perceive users' preferences.
Coincidentally, speech is the most efficient way of interaction in producing information. Users can generate enough Long Context through speech interaction. Based on past Long Context, AI can summarize users' preferences and recommend suitable audio content to them.
△ Business news podcasts recommended by Laifu Radio based on the author's listening history. Photo source: Author's trial
Not everyone agrees with this non - consensus logic. However, it has also attracted some investors, such as Shen Nanpeng, the founding and executive partner of Sequoia China. Sequoia only took a week from project establishment to approval.
In the second half of 2025, Laifu completed a second - round financing led by Fortune Capital with Sequoia China participating. The total amount of the two - round financing reached more than 10 million US dollars.
However, creating an AI podcast platform was far from Jiao Ke's original intention. He wants to create "people," AI anchors.
In the Internet era, Jiao Ke was in charge of the music service product "MP3 Search" at Baidu, founded a ToC financial platform, and was responsible for a ToG project in the Middle East. In the AI era, this Internet veteran began to think: What is the product form different from that of the Internet?
His answer is: The Internet era solved the problem of connection efficiency, while AI solves the problem of productivity.
Therefore, tools and platforms are products of the Internet era, while "people" are the product form unique to the AI era.
This is also the operating logic of Laifu Radio today.
Jiao Ke told us that currently, there are 15 Chinese AI anchors and 2 English AI anchors on Laifu. They have different styles, host different channels, and can remember listeners' preferences.
"You will form a connection with the anchors. Just like listening to a radio program, if the anchor changes, you'll feel uncomfortable." To let users feel the presence of "people," Jiao Ke designed a ball that takes up most of the screen on Laifu, which bounces in rhythm with the AI anchor's speech.
△ The ball that bounces in rhythm with the AI anchor's speech. Photo source: Author's trial
When users open Laifu, they can see their favorite AI anchors have already prepared content they're interested in, waiting to be listened to. During this process, users can also interrupt the program at any time, ask questions, join the discussion, or seek emotional companionship.
In Jiao Ke's view, this is the prototype of Samantha.
△ The author asks an AI anchor: Why has the capital withdrawn from the technology sector flowed into low - valuation and dividend stocks? Photo source: Author's trial
The following is the content of the conversation between "Intelligent Emergence" and Jiao Ke, which has been sorted and edited:
What I'm doing is not an AI podcast but creating "anchors"
Intelligent Emergence: How do you define Laifu? Many people say it's an "AI podcast."
Jiao Ke: I don't think I'm creating an AI podcast platform.
Currently, Laifu has 15 Chinese AI anchors and 2 English AI anchors as we define them, and each "person" has a different style. Users often name a specific anchor when using the product.
Laifu highly emphasizes the human attribute. What we're actually creating is "people," AI anchors.
Intelligent Emergence: Xiao Chuan (founder & CEO of Baichuan Intelligence) also said he wants to "create people."
Jiao Ke: We have a great consensus on this.
The once - booming Internet healthcare industry didn't achieve much in the end. The reason is that the Internet revolution is essentially a revolution in production relations, solving the problem of efficiency rather than productivity.
However, China's biggest problem is that there are only 4.4 million doctors, and there are even fewer good doctors. The supply is severely insufficient.
In early 2023, I had many conversations with Xiao Chuan downstairs at his home. He said at that time that he wanted to create AI doctors. Why do we believe in AI healthcare? Because the essence of AI is a productivity revolution. Using AI to create doctors can fundamentally solve the supply problem.
Intelligent Emergence: Is the problem in the audio industry also a supply problem?
Jiao Ke: Yes. Recently, I saw someone post saying that there are already so many human - made podcasts. Why should I listen to AI podcasts? Actually, the cost of producing audio by humans is very high, even higher than video production.
For videos, even if the anchor has an accent or the surrounding environment is noisy, you can add subtitles later without affecting the viewing experience. However, for audio, you can only listen, so the requirements for recording quality are very high. You need a recording studio or at least a microphone. In post - production, you also need to edit out verbal tics, pauses, and repetitions.
The amount of audio produced by humans is limited. For example, Xiaozhouyu has about 500,000 episodes of programs a year, with an average of more than 1,000 new episodes per day. Humans produce tens of millions of videos every day. No one complains about too many AI videos. Why do they think there are too many AI audios?
Intelligent Emergence: Although the supply is low, do users really have so much demand for listening to audio?
Jiao Ke: Everyone has a lot of "ear time" every day, such as during the commute to and from work, while exercising, doing housework, or before going to bed.
Deloitte released a report stating that, excluding music, there are about 1.6 billion audio listeners worldwide. Moreover, audio is a high - frequency and essential need. At least users listen to audio every two days.
Intelligent Emergence: Currently, there are two mainstream directions for AI applications: one is tools, and the other is platforms. Are these not the product forms you want to create?
Jiao Ke: The platform economy is the product form of the Internet. Tool - type products are actually part of platform services. For example, platforms that serve both creators and consumers provide creation tools for producers, and the content produced by these tools is then supplied to consumers.
Many current AI products still seem to be platforms or tools, which are easily within the reach of large companies.
The real product form in the AI era should be "people," such as scientists, doctors, and anchors. This is a product form that the Internet era did not have but AI can achieve.
Intelligent Emergence: What is the product form of the "people" you create?
Jiao Ke: The movie "Her" is a great product manager because it defines how a product should interact with users.
At first, Samantha's interaction with the male protagonist was to help him handle emails actively. Later, they developed a relationship not through the male protagonist's active chatting but by playing games and building blocks together. Working together to achieve something is real companionship.
A big problem with many AI companion products is that they rely heavily on users' active input. You have to keep talking to the AI, but most users don't have that much to say. So, in the end, only a small number of users with a strong desire to express themselves stay.
Intelligent Emergence: Why did you choose the audio industry to "create people"?
Jiao Ke: Audio has a value that video doesn't have, which is communicability. It is the most natural way of human communication. Audio can also easily trigger emotional attributes. Previously, emotional hotlines were mostly on the radio, not on TV.
The movie "Her" has had a significant impact on this wave of AI entrepreneurship. The voice used in GPT - 4o comes from the AI "Samantha" in the film. Many people don't realize that throughout the movie, Samantha has no image, only a voice.
So, the voice is very important. Audio has a strong companion attribute. Currently, images, videos, and robots have not overcome the uncanny valley effect, but audio can. This is an important reason for us to focus on audio.
Moreover, audio is non - intrusive and non - exclusive. It's quite tiring to watch a half - hour video because all your senses, your eyes, hands, and ears are occupied. But listening to audio is okay.
In the next two years, I believe people will become lazier to take out their phones and click on apps to socialize or search for information. Since machines can now understand human speech, our future interaction interface may become voice.
Intelligent Emergence: How did you design the functions of Laifu?
Jiao Ke: What Laifu does is the same as in "Her." It uses content provision as an entry point to interact with users. Users can not only listen to programs but also chat with AI anchors at any time.
We hope to create a feeling that you randomly enter a room where two anchors are talking about something you're interested in. You can sit down and listen quietly or participate in their discussion at any time.
During this process, you will form a connection with the anchors. Just like listening to a radio program, if the anchor changes, you'll feel uncomfortable.
Laifu can also quickly produce audio content according to your needs or preferences. For example, for some time - sensitive content, human - made podcasts may take a week to produce. However, an AI anchor can prepare the content in less than an hour. This is the opportunity we've seen.
Intelligent Emergence: You've experienced the entire Internet cycle. What kind of thinking needs to be changed when starting an AI business?
Jiao Ke: If you think AI is a new technological cycle, be very careful not to use the inertia of the Internet to do things.
I've talked to many Internet product managers at Baichuan. But they still think about how to build platforms, two - sided markets, and how to do advertising.
However, the network effect doesn't exist in the AI era. Many people starting businesses in the AI era say they want to build platforms, but platforms are the product form of the previous Internet era. The Internet doesn't change production but reduces transaction costs through the interconnection between producers and consumers.
So, large Internet companies all follow the platform economy model, allowing you to scale up both production and consumption ends simultaneously by investing a large amount of capital in a short period.
For example, Didi has drivers on one end and passengers on the other. If there is only one end, the economic model of the Internet platform is ineffective. Once both ends grow, the barrier of the Internet platform is established.
However, AI is a productivity revolution. Productivity directly produces goods or services, resulting in a one - sided market. The advertising of some AI products in 2024 has proven that a one - sided market cannot be developed by burning money. Once there is a better product, users are likely to switch.
Since AI is a new technological cycle, don't use Internet thinking to create products. Instead, create something that represents a generational leap.
Compared with DAU, I value DTU (Daily Talk User) more
Intelligent Emergence: Doubao is a product of a large company and has also launched an AI podcast function. It has a significant advantage in DAU. Will Doubao overshadow what you're doing?
Jiao Ke: Doubao is a tool - type product.
Intelligent Emergence: Many users have also established an emotional connection with Doubao.
Jiao Ke: Let's look at the proportion. Most users still regard Doubao as a search tool.
Chatbots are used on - the - go, and it's difficult for users to generate Long Context. Currently, the average daily usage time of Laifu users has reached half an hour.
Moreover, you have to actively interact with Doubao