Das Wettrennen im Bereich der KI-Audio von einem ehemaligen Mitbegründer von Baichuan Intelligence: Ich will "Menschen" erschaffen, KI-Presenter schaffen
Text | Zhou Xinyu
Interview compilation | Zhong Chudi
Editor | Su Jianxun
Her, released in 2013, is Jiao Ke's favorite movie.
In the movie, the AI Samantha has no face or image. All people can perceive is her gentle and calm voice. When Samantha says, "You've been through so much lately. You've lost a part of yourself," the male protagonist breaks down in tears.
This scene deeply touched Jiao Ke: "Just a voice can create such a strong emotional connection with people."
Later, in early 2025, the time set in Her, Jiao Ke, the co - founder of Baichuan Intelligence, chose to leave his job and start a business. He founded an AI audio company called Laifu Radio.
△ Jiao Ke, former co - founder of Baichuan Intelligence and founder & CEO of Laifu Radio. Photo source: Provided by the interviewee
At the time when he started his business, the audio field was a controversial area. In July 2023, Google released the knowledge base NotebookLM, which can generate 10 - 20 - minute audio from users' research materials. This product brought new possibilities to the AI podcast field.
However, on the other hand, the domestic audio field has so far achieved mediocre results. The leading podcast platform, Xiaozhouyu, had only about 6 million monthly active users in early 2024, far less than long - video platforms.
During the financing process, Jiao Ke also faced many doubts: The efficiency of information transmission through audio is much lower than that of video, and the market ceiling for audio is not high.
In our conversation, when answering the same question, Jiao Ke spent more than 30 minutes, chatting from Her and 2001: A Space Odyssey to Xiaozhouyu and Doubao. For him, there are too many reasons to focus on audio:
Due to the high production cost, the supply of domestic audio content is too low, while users have a lot of "ear time" every day;
Currently, the supply of high - quality audio content cannot meet the personalized audio needs of different users.
More importantly, compared with video and text, audio is the most natural form of human interaction and has a strong companion attribute.
He told us that the key to maximizing the advantages of audio is AI.
On one hand, voice understanding and generation technologies can solve the supply problem and also establish an emotional connection with users during interaction;
On the other hand, AI starts to understand users' preferences.
Coincidentally, voice is the most efficient form of interaction for information production. Users can generate enough Long Context through voice interaction. Based on the past Long Context, AI can summarize users' preferences and recommend suitable audio content to them.
△ Business news podcasts recommended by Laifu Radio based on the author's listening history. Photo source: Author's trial use
Not everyone agrees with this non - conventional logic. However, it has also attracted some investors, such as Shen Nanpeng, the founding and executive partner of Sequoia China. Sequoia only took a week from project initiation to approval.
In the second half of 2025, Laifu completed a second - round financing led by Fortune Capital with Sequoia China participating. The total amount of the two - round financing reached more than 10 million US dollars.
However, building an AI podcast platform was not Jiao Ke's original intention. He wants to create "people", that is, AI anchors.
In the Internet era, Jiao Ke was in charge of the music service product "MP3 Search" at Baidu, founded a ToC financial platform, and was responsible for a ToG project in the Middle East. In the AI era, this Internet veteran began to think: What is the product form different from that in the Internet era?
His answer is: The Internet era solved the problem of connection efficiency, while AI solves the problem of productivity.
Therefore, tools and platforms are products of the Internet era, while "people" are the unique product form of the AI era.
This is also the operating logic of Laifu Radio today.
Jiao Ke told us that currently, there are 15 Chinese AI anchors and 2 English AI anchors on Laifu. They have different styles, host different channels, and can remember listeners' preferences.
"You will form a connection with the anchors. Just like listening to a radio program, if the anchor changes, you'll feel uncomfortable." To make users feel the presence of "people", Jiao Ke designed a ball that takes up most of the screen for Laifu, which bounces in rhythm with the AI anchor's speech.
△ The ball bouncing in rhythm with the AI anchor's speech. Photo source: Author's trial use
When users open Laifu, they can see their favorite AI anchors have prepared content they are interested in, waiting to be listened to. During this process, users can also interrupt the program at any time, ask questions, join the discussion, or seek emotional companionship.
In Jiao Ke's view, this is the prototype of Samantha.
△ The author asked the AI anchor: Why did the funds withdrawn from the technology sector flow to low - valuation and dividend stocks? Photo source: Author's trial use
The following is the content of the conversation between Intelligent Emergence and Jiao Ke, which has been organized and edited:
What I'm doing is not an AI podcast, but creating "anchors"
Intelligent Emergence: How do you define Laifu? Many people say it is an "AI podcast".
Jiao Ke: I don't think what I'm doing is an AI podcast platform.
Laifu now has 15 Chinese AI anchors and 2 English AI anchors as we define them. Each "person" has a different style. Users often name a specific anchor when using the product.
Laifu highly emphasizes the human attribute. What we are actually creating are "people", that is, AI anchors.
Intelligent Emergence: Xiao Chuan (founder & CEO of Baichuan Intelligence) also said he wants to create "people".
Jiao Ke: We share a great consensus on this.
The once - booming Internet medical industry didn't achieve much in the end. The reason is that the Internet revolution is essentially a revolution in production relations. It solves the problem of efficiency, not productivity.
However, China's biggest problem is that there are only 4.4 million doctors, and there are even fewer good doctors. The supply is seriously insufficient.
In early 2023, I had many conversations with Xiao Chuan downstairs at his home. He said at that time that he wanted to develop AI doctors. Why do we believe in AI medical? Because the essence of AI is a productivity revolution. Using AI to create doctors can fundamentally solve the supply problem.
Intelligent Emergence: Is the problem in the audio field also a supply problem?
Jiao Ke: Yes. Recently, I saw someone post saying that there are already so many human - hosted podcasts. Why should I listen to AI podcasts? In fact, the cost of producing audio by humans is very high, even higher than video production.
For video, even if the anchor has an accent or the surrounding environment is very noisy, you can add subtitles in the post - production, which doesn't affect the viewing. However, for audio, you can only listen, so the requirements for recording quality are very high. You need a recording studio, or at least a microphone. In post - production, you also need to edit out filler words, pauses, and repetitions.
The amount of audio produced by humans is limited. For example, Xiaozhouyu has about 500,000 episodes of programs a year, with an average of more than 1,000 new episodes per day. Humans produce tens of millions of videos every day. No one complains about too many AI videos. Why do they think there are too many AI audios?
Intelligent Emergence: Although the supply is low, do users really have so much demand for listening to audio?
Jiao Ke: Everyone has a lot of "ear time" every day, such as during the commute to and from work, while exercising, doing housework, or before going to bed.
Deloitte released a report stating that, excluding music, there are about 1.6 billion audio listeners worldwide. Moreover, audio is a high - frequency and essential need. At least users listen to audio every two days.
Intelligent Emergence: Currently, there are two mainstream directions for AI applications: one is tools, and the other is platforms. Are these not the product forms you want to create?
Jiao Ke: The platform economy is a product form of the Internet. Tool - type products are actually part of platform services. For example, a platform that serves both creators and consumers provides creation tools for producers, and the content produced by these tools is then supplied to consumers.
Many current AI products still seem to be platforms or tools, which are easily within the reach of large companies.
The real product form in the AI era should be "people", such as scientists, doctors, and anchors. This is a product form that the Internet era does not have, but AI can achieve.
Intelligent Emergence: What is the product form of the created "people"?
Jiao Ke: The movie Her is a great product manager because it defines how a product interacts with users.
At the beginning, Samantha interacted with the male protagonist by actively helping him handle emails. Later, they developed feelings not through the male protagonist's active chats but by playing games and building blocks together. Achieving something together is real companionship.
A major problem with many AI companion products is that they rely heavily on users' active input. You need to keep talking to the AI, but most users don't have that much to say. So, in the end, only a small number of users with a strong desire to express themselves stay.
Intelligent Emergence: Why did you choose the audio field to create "people"?
Jiao Ke: Audio has a value that video doesn't have, which is communicability. This is the most natural form of human communication. Audio can also easily trigger emotional attributes. In the past, radio stations had emotional hotlines, but TV stations didn't.
The movie Her has had a great impact on this wave of AI startups. The voice used in GPT - 4o is from the AI "Samantha" in the movie. Many people don't realize that from beginning to end, Samantha has no image, only a voice.
So, the voice is very important. Audio has a strong companion attribute. Currently, images, videos, and robots haven't overcome the uncanny valley effect, but audio can. This is an important reason for us to focus on audio.
Moreover, audio is non - intrusive and non - exclusive. It's quite tiring to watch a half - hour video because all your senses, your eyes, hands, and ears are occupied. However, it's okay with audio.
In the next two years, I believe people will become lazier to take out their phones and click on apps to socialize or search for information. Since machines can understand human language, the interface for our future interaction may become voice.
Intelligent Emergence: How did you design the functions of Laifu?
Jiao Ke: What Laifu does is similar to Her. It uses content provision as an entry point to interact with users. Users can not only listen to programs but also chat with AI anchors at any time.
We hope to create a feeling that you randomly enter a room where two anchors are talking about something you're interested in. You can sit quietly and listen, or participate in their discussion at any time.
During this process, you will form a connection with the anchors. Just like listening to a radio program, if the anchor changes, you'll feel uncomfortable.
Laifu can also quickly produce audio content according to your needs or preferences. For example, for some time - sensitive content, it may take a human podcast a week to produce. However, an AI anchor can prepare the content in less than an hour. This is the opportunity we've seen.
Intelligent Emergence: You've experienced the entire Internet cycle. What kind of thinking do you need to change when starting an AI - related business?
Jiao Ke: If you think AI is a new technological cycle, be very careful not to rely on Internet - era thinking.
I've talked with many Internet product managers at Baichuan. But they still focus on how to build platforms, how to create a two - sided market, and how to invest in advertising.
However, the network effect doesn't exist in the AI era. Many entrepreneurs in the AI era say they want to build platforms, but platforms are a product form of the previous Internet era. The Internet doesn't change production but reduces transaction costs through the interconnection of producers and consumers.
So, large Internet companies all adopt the platform economy model, allowing you to achieve large - scale production and consumption simultaneously by investing a large amount of capital in a short period.
For example, with Didi, one side is drivers, and the other side is passengers. If only one side exists, the economic model of the Internet platform is ineffective. Once both sides are developed, the barrier of the Internet platform is established.
However, AI is a productivity revolution. Productivity directly produces goods or services, resulting in a one - sided market. The advertising investment of some AI products in 2024 has proven that a one - sided market cannot be developed by burning money. Once a better product appears, users are likely to switch.
Since AI is a new technological cycle, don't use Internet - era thinking to create products. Instead, create something that represents a generational leap.
Compared with DAU, I value DTU (Daily Talk User) more
Intelligent Emergence: Doubao is a product of a large company and has also launched an AI podcast function. It has a significant advantage in DAU. Will Doubao overshadow what you're doing?
Jiao Ke: Doubao is a tool - type product.
Intelligent Emergence: Many users have also established an emotional connection with Doubao.
Jiao Ke: Let's look at the proportion. Most users still regard Doubao as a search tool.