HomeArticle

Baidu has created GMV with AI digital humans.

晓曦2025-06-18 21:43
Luo Yonghao is back again.

On the evening of June 15th, the digital version of Luo Yonghao completed its first live - streaming session on Baidu E - commerce, attracting over 13 million views. Among them, the GMV (Gross Merchandise Volume) exceeded 55 million yuan. The sales volume of core product categories such as 3C products and food exceeded the figures of Luo Yonghao's first live - streaming session as a real person in May, setting a new record for digital human live - streaming with goods.

While most AI companies are still in the stage of competing for market share and users in the multi - modal field, Baidu has already submitted a perfect answer to the industry based on its more advanced and persuasive digital human technology - creating GMV with AI.

However, how exactly is a digital human with such strong sales - driving ability produced? How can a persuasive digital human be realized? Baidu recently officially revealed the answers.

Baidu Has Created GMV with AI

2025 is undoubtedly the year of intelligent agents. In addition to the previously popular general intelligent agents, more and more vertical intelligent agents focusing on scenarios such as office work and design have emerged, occupying people's attention.

In the view of industry insiders, Baidu's persuasive digital human is essentially a super intelligent agent, and its usability is evident from the results of this live - streaming session.

36Kr learned that as the first cooperation between a top - tier digital human anchor and Baidu E - commerce, the "Digital Luo Yonghao" is supported by the digital human live - streaming technology of Baidu Huibo Star. This persuasive digital human technology was first launched in April this year. Its feature is that it can be as coordinated as a real person in terms of appearance, expression, and voice, can think and make decisions, and can collaborate to complete specific tasks. Essentially, it is a super intelligent agent.

Ping Xiaoli, the vice - president of Baidu and the general manager of Baidu E - commerce, also said, "Baidu's digital version of Luo Yonghao has achieved a sensory effect comparable to that of top - tier anchors. With the development of large models and multi - modal capabilities, digital humans have great potential to surpass real people in the future."

According to Baidu, the digital version of Luo Yonghao is a new - generation persuasive digital human of Huibo Star. Trained with a large amount of Luo Yonghao's data, it has achieved all - around upgrades in aspects such as scripts, actions, voices, texts, Q&A, and interactions. Through the tacit interaction between two digital humans, the digital human's ability to pick up and play with jokes, and more frequent interactive Q&A, it has enhanced users' sense of immersion, achieved four major breakthroughs in experience, content, vision, and effect, and achieved several industry firsts.

Ping Xiaoli shared many user comments she saw. The most common feedback was that people thought it was very realistic. Many users also asked in the live - streaming room if Luo was an AI. Ping Xiaoli believes that with the positive feedback received by the digital version of Luo Yonghao, a benchmark IP, it shows that users have a higher degree of acceptance and recognition of digital humans. Huibo Star's digital human is no longer just an AI tool for merchants to reduce costs and increase efficiency. It also brings a brand - new experience to users and is a new interpretation of the matching between people and goods. "This is a new milestone, marking that intelligent e - commerce has entered a new chapter."

No wonder Luo also expressed his recognition through Weibo and videos, saying "perfect ending", "amazing", and "really admirable". When asked about his feelings towards his digital version, Luo Yonghao, the chief experience officer of Huibo Star, said it exceeded his expectations: "Digital human live - streaming may represent a new trend in e - commerce live - streaming."

Undoubtedly, this was a special live - streaming session to "show off strength". The results directly exceeded the GMV of Luo Yonghao's own live - streaming session in May. Compared with the previous one, this live - streaming session received more attention, directly arousing the outside world's curiosity about its underlying technology.

As a super intelligent agent, the digital human created by Baidu Huibo Star demonstrates extremely comprehensive capabilities. It can not only provide you with a digital human anchor with a highly consistent appearance, expression, and voice, but also enable the AI brain to improve conversion rates through multi - agent scheduling, allowing one person to function like a live - streaming team. It is reported that by using Huibo Star, the live - streaming conversion rate has increased by an average of 31%, and the live - streaming cost has been reduced by 80%. This is due to Huibo Star's full - stack self - developed capabilities of Baidu, which achieve the optimal user experience.

Wu Tian, the vice - president of Baidu Group, also said at the open - day event, "In terms of architecture, Baidu's AI technology has always been developed in a full - stack manner. In terms of modality, it has also been developed in a full - modal way. The three technologies of language, speech, and vision have all gone through years of development and accumulation. Now is a very good time to enter the stage of multi - modal integration from single - modality."

Long - term technological accumulation has enabled Baidu to embrace the moment of change earlier. The live - streaming results of the digital version of Luo Yonghao on Baidu E - commerce have opened a crack in the field of intelligent agents. In the future, through large - scale means, the cost can be controlled to the minimum, directly solving the two major pain points in the intelligent agent track.

While the industry is still caught in the debate over multi - modal routes, Baidu has already started creating GMV with AI.

Making a Direct Move in the Field of Digital Humans

As Robin Li said, digital humans are a comprehensive manifestation of Baidu's large models in the multi - modal field. Focusing on specific vertical fields, compared with general video - generation models, digital humans have advantages such as more precise model optimization, pursuit of the ultimate human - machine interaction experience, real - time interaction ability, relatively low technical threshold, clear application scenarios, and easier formation of replicable business models. This long - duration live - streaming session of the digital version of Luo Yonghao is the best demonstration of Baidu's multi - model capabilities.

It is reported that during the entire live - streaming session, the AI accessed the knowledge base 13,000 times, generated 97,000 - word product explanations, and the two digital human partners made over 8,300 actions.

At the communication meeting, Wu Tian, the vice - president of Baidu Group, also specifically explained and demonstrated the technical logic behind the digital human.

Baidu's multi - modal collaborative digital human technology solution mainly includes five innovative technologies: script - driven multi - modal collaboration of digital humans, script generation integrating multi - modal planning and in - depth thinking, real - time interaction with dynamic decision - making, text - controlled speech synthesis, and high - consistency and ultra - realistic long - video generation of digital humans, achieving a high - level unity of the "spirit, form, sound, appearance, and speech" of digital humans. Finally, an ultra - realistic digital human with high expressiveness, attractive content, and free interaction between people, objects, and the environment is presented.

First of all, the language model is its core driver. The language model is responsible for generating the "script", which then guides the speech and vision for multi - modal system and dynamic interaction.

Therefore, the quality of the script is crucial to the performance of the digital human. It should be noted that during the live - streaming process of a digital human, the most important thing is the voice - over lines. These lines will also produce diverse results due to the diverse styles, realistic personas, and attractive content of different digital humans.

This is exactly where the high - quality and anthropomorphic script - generation technology comes in. It enables the digital human to have a distinct personality and character traits, and an interesting language style, just like a real - life anchor. All of this greatly tests the ability of the language model.

During the script - generation process, visual and speech tags will be generated to guide the speech model to adjust the intonation and the visual model to align the lip movements and expressions, making the digital human look more natural and fluent. In addition, the digital human can also interact with users in real - time and dynamically according to the popularity of the live - streaming room and users' feedback.

For a digital human live - streaming session with a high - quality experience, after the script, the key steps also include speech synthesis and video generation.

In terms of speech synthesis, through the text - controlled large speech - synthesis model, a highly restored speech - synthesis ability is achieved. Then, combined with the live - streaming lines and the characteristics of the speaker, a voice with an appropriate style, natural and fluent, is synthesized. To solve the difficulty of the voice coordination between two digital humans in Luo Yonghao's live - streaming session, they used a dialogue context encoder to perform unified inference and calculation on the historical dialogue input and the current dialogue for speech synthesis, allowing us to see the natural dialogue effect between the digital versions of Luo Yonghao and Zhu Xiaomu in the live - streaming room.

Compared with the commonly available 10 - second or 20 - second generated videos on the market today, the live - streaming work of digital humans is often measured in hours. Baidu has therefore built a complete set of technologies for digital human image generation and driving.

This technology is a controllable long - video generation process. Through features such as videos, scripts, languages, and skeletons, and by combining technologies such as multi - modal video understanding, cross - modal signal generation, and video generation, the high - consistency long - video generation of digital humans is completed.

Not Chasing Super - Apps, but Creating Super - Useful Ones

In April this year, when Baidu launched its persuasive digital human technology, Robin Li said with emotion at the scene, "One of the most exciting breakthrough applications in 2025 is the AI digital human." Robin Li also introduced at that time, "The persuasive digital human launched by Baidu has the characteristics of ultra - realistic voice and appearance, more professional content, and more flexible interaction, and has huge application potential in fields such as e - commerce live - streaming, gaming, and consumption."

Actually, at the beginning of betting on the large - model business, Baidu put forward a special view - Baidu is not aiming to launch a "super - app", but to help more people and enterprises create millions of "super - useful" applications. And digital humans are exactly the "super - useful" applications in the current e - commerce industry.

Actually, before the digital version of Luo Yonghao started live - streaming on Baidu, the industry had been discussing whether Luo Yonghao would make a high - profile comeback and join Baidu to replicate his outstanding achievements on other platforms.

The result far exceeded the industry's expectations. More importantly, although it is a digital human, its user conversion effect is comparable to that of a real person.

When a user asked about the recent hot event "Scottish Premiership" in the live - streaming room, the digital version of Luo Yonghao responded, "I know the Scottish Premiership has been quite popular recently. There are many hot jokes like 'first in the game, fourteenth in friendship'. I suggest the Chinese national football team learn from it." The experience was very smooth.

It is reported that this live - streaming session was watched by over 13 million users, with three times the amount of user interaction, the average user viewing time increased by over 30%, the order volume was 150% higher than that of the real - person live - streaming session, and the number of users who placed orders was 230% more than that of the real - person anchor. These figures also mean that the acceptance of digital humans by users has been verified.

Wu Chenxia, the person - in - charge of Baidu E - commerce's business department and the digital human innovation business department, also specifically revealed the secret of the success of the digital version of Luo Yonghao - Baidu trained and generated the digital human anchor with a large amount of data of Luo Yonghao and Zhu Xiaomu. At the same time, a live - streaming script was customized according to the characteristics of the products and Luo Yonghao's persona, achieving a high - level unity in all modalities of appearance, expression, and voice, restoring Luo Yonghao and Zhu Xiaomu's habitual actions and expressions, and making it as natural as a real person.

"In many scenarios, we were worried that digital humans could only read the script but couldn't keep users watching. This live - streaming session proved that they can," Wu Chenxia explained to us.

If one Luo Yonghao can achieve such results, more anchors may have the opportunity to experience the efficiency and convenience brought by the trend of technology popularization.

It should be noted that in the past, due to immature technology, the effect of digital humans was poor, and the experience of ordinary consumers was far from satisfactory. Many platforms prohibited digital human anchors from conducting live - streaming with goods and other activities.

However, Baidu E - commerce has opened up a new market with its mature technical architecture. It has not only solved the problem of multi - modal commercial implementation but also found a more scientific and promising technical direction for the live - streaming e - commerce industry. On many e - commerce platforms, digital human live - streaming is changing from an option to a necessity.

At this communication meeting, Baidu E - commerce launched two plans: the Dream Butterfly Plan will double the number of Baidu's preferred top - tier digital human anchors through traffic support, the creation of top - tier digital human anchors, and budget support; the Starry Sky Plan will add another 100,000 Huibo Star digital humans, invest 100 million yuan in digital human consumption subsidies, and provide tens of millions of yuan in operational support to help more ordinary people and small and medium - sized enterprises start digital human live - streaming.

This may just be a new beginning. According to Ping Xiaoli, Baidu divides digital humans into four stages. "In the 1.0 stage, only the appearance of virtual humans was simply realized, but the actions were stiff and the mechanical voice was obvious. The 2.0 stage is the ultra - realistic digital human, which has achieved high - precision cloning of the human image, supports large - scale actions, gets rid of the paper - like effect, can generate voice - over scripts and interact with the audience. Currently, the mainstream digital humans are at this stage. Baidu's persuasive digital human has brought AI digital humans into the 3.0 stage."

<