Renowned Sound Engineer Enters Smart Voice Track with AI Emotional Performance System

Jiemu Acoustics Launches RMB 3 Million Angel Round Financing to Develop AI Emotional Performance System

Jiemu Sound Technology Launches Angel Round of Financing, Seeking High - quality Investors in the AI Application Field for Negotiations

Jiemu Sound Technology, which focuses on the R & D of AI emotional performance technology, has recently launched a 3 - million - yuan angel round of financing, and the investor is yet to be determined. All the funds from this round will be used for core technology R & D, team building, and market expansion, aiming to tap into the dividends of the overseas market for AI - powered comic dramas and short dramas to open up the B - end market. Different from common AI dubbing tools in the market, Jiemu Sound Technology positions itself as an "in - the - loop director" AI emotional performance system, emphasizing that machines should understand and perform complex human emotions rather than simply converting text to speech.

The Explosion of Content Production Capacity Drives Upgrades, and There is a Shortage of Professional - Grade AI Emotional Performance Supply

The content tracks such as short dramas going global and AI - powered comic dramas are in a period of rapid growth. According to industry data, the overseas micro - short drama market will exceed 4 billion US dollars in scale by 2025, with a growth rate of over 126%; the number of AI - powered comic dramas launched in the first half of the year alone has reached 50,000, and the annual market is expected to exceed 20 billion yuan. However, while the production capacity is exploding, the supply of high - quality emotional performances is severely insufficient. General AI dubbing products have insufficient emotional expression accuracy and cannot meet the professional requirements for restoring human voice emotions in the fields of film, television, and short dramas. Traditional manual dubbing has high costs and long delivery cycles, making it difficult to match the daily production rhythm of dozens of episodes for short dramas going global.

Jiemu Sound Technology believes that as the content market shifts from "competing on quantity" to "competing on quality", an AI performance system that can accurately convey emotions will become a key bottleneck in the content production chain. Its product is not positioned as a "dubbing tool" but as an "AI emotional performance system", and the core difference lies in the "in - the - loop director" architecture and the codified modeling of human performance logic.

The "In - the - Loop Director" Architecture Enables Controllable Emotions, and the Completion of Key Breakthroughs is Verified in Real Projects

The core technology of Jiemu Sound Technology is the "in - the - loop director" AI emotional performance system, which adopts an architecture of "off - line pre - processing + parallel generation at runtime". The system first conducts a structured analysis of the complete script, establishes character profiles and scene emotional constraints to form a global understanding; at runtime, it generates structured emotional tags (including emotion types, intensities, accents, pauses, breaths, etc.) through a large language model, and then converts them into underlying acoustic parameters via a self - developed controller to drive the speech synthesis engine to generate dubbing audio. The system also designs a three - level director intervention mechanism, allowing users to perform fine - control at nodes such as character profile correction, emotional intensity boundary confirmation, and single - sentence parameter fine - tuning.

It is worth noting that this system has been verified in a real production environment. A director friend used Seedance 2.0 to generate a video for his AI - powered comic drama entry but found that the built - in dubbing could not meet the performance requirements of the character (a middle - aged shrew). He then used Jiemu Sound Technology's technology to complete the dubbing of all the lines. This case directly proves that Jiemu Sound Technology can deliver commercial - grade results in solving the "emotional accuracy" problem that competitors cannot handle.

According to internal blind tests, in a multi - character long - text dubbing scenario of 2000 words, the subjective score of Jiemu Sound Technology's emotional coherence reached 4.5/5 (on a five - point scale), an increase of about 61% compared to the 2.8/5 of the publicly available emotion - controllable TTS model (IndexTTS2); the accuracy of accent execution reached 92%, and the error of pause duration was controlled within ±0.08 seconds. During the product testing phase, many film and television dubbing directors and producers gave an evaluation of "surpassing ordinary dubbing actors". More than 3 senior film and television producers and investors have clearly expressed their intention to try the product after its launch.

The Project Workstation System Fits the Industry Logic, and the Business Model Points to a Data Flywheel

In terms of business model, Jiemu Sound Technology has launched the "project workstation system" - customers purchase AI dubbing workspaces by project. Within the workstations, there is unlimited generation, unlimited modification, and multiple candidate outputs, and no additional fees are charged until the final version is determined. The pricing is set at 40% - 60% of the industry's manual price. The price for a short - drama workstation is 6,800 yuan per drama, 5,800 yuan per episode for a TV drama workstation, 3,500 yuan per hour for a radio drama workstation, and 35,000 - 58,000 yuan per film for a film and television workstation. At the same time, value - added services such as professional voice packs, customized voice cloning, and director - level tuning are provided. The team plans to gradually train a more powerful "empathy" model through the high - quality emotional voice data accumulated from commercialization, and extend its technical capabilities to general human - machine interaction scenarios such as virtual humans and intelligent assistants.

The Team with a Composite Background is Established, and the Core Technology Patent is Under Review

Li Tian, the founder of Jiemu Sound Technology, once served as the sound recorder for "Ne Zha: Birth of the Demon Child" and the dialogue director for "Leap". He won the title of Outstanding Dialogue Director at the 2nd China Film Industry Week. He has also worked as a sound supervisor at NetEase, Himalayas, and Dramawave, an overseas short - drama platform under Kunlun Tech. He is one of the very few composite entrepreneurs in China with both a top - notch film and television sound professional background and AI productization thinking. Shi Zhenyu, the full - stack engineering leader, was a former front - end expert engineer at Alipay and was responsible for the quality control of the Five - Fortune and 618 promotions. Lin Zhanjie, the technical partner, was a former AI chip engineer at Canaan Creative and has rich experience in independent AI application development.

Currently, the core technical framework of the project has been established. The core invention patent has been applied for and has entered the pre - review acceleration channel of the Beijing Intellectual Property Protection Center, and several supporting patents have been submitted in batches. Comparison demos of segments from "Dying to Survive" and "Ne Zha: Birth of the Demon Child" have been produced. After the financing is in place, the reserved personnel in the fields of voice algorithms, business BD, and product promotion will immediately take up their positions.

Jiemu Sound Technology is currently in the angel round and has not yet completed the official product launch and commercialization. The 3 - million - yuan financing in this round will be used for team building, core data asset construction, computing resources, and product development. It is planned to launch a beta version of the product within 6 - 9 months, sign the first batch of benchmark customers, and form a stable cash flow. With the explosive growth of content tracks such as short dramas going global and AI - powered comic dramas, the market demand for professional - grade AI emotional performance systems is rapidly expanding. Jiemu Sound Technology is expected to fill the market gap with its technological first - mover advantage and establish a leading position in the track.

This article is originally produced by「jamsound」， For reprint or content cooperation, please click Reprint Instructions ；Unauthorized reprint will be held accountable.

A renowned sound engineer enters the smart voice track with an AI emotional performance system

Jiemu Sound Technology Launches Angel Round of Financing, Seeking High - quality Investors in the AI Application Field for Negotiations

The Explosion of Content Production Capacity Drives Upgrades, and There is a Shortage of Professional - Grade AI Emotional Performance Supply

The "In - the - Loop Director" Architecture Enables Controllable Emotions, and the Completion of Key Breakthroughs is Verified in Real Projects

The Project Workstation System Fits the Industry Logic, and the Business Model Points to a Data Flywheel

The Team with a Composite Background is Established, and the Core Technology Patent is Under Review