SmartMore is going public, but the moat of AI voice is gone.
On May 25th, SmartVoice submitted an IPO application to the STAR Market.
As one of the earliest domestic AI companies, SmartVoice has been around for nearly two decades. However, judging from its performance, the growth of this established AI company is not very fast. From 2023 to 2025, the company's revenues were 539 million yuan, 601 million yuan, and 688 million yuan respectively, with a two - year compound growth rate of only 12.98%.
To some extent, SmartVoice is at an awkward juncture.
Over the past decade or so, voice technology has always been the most typical technology - barrier track in the field of artificial intelligence. Companies such as Cerence, SoundHound, and SmartVoice have all benefited from this round of industrial dividends.
However, after the emergence of large models, the underlying logic of the industry is being rewritten. Tech giants such as OpenAI, Google, Alibaba, and ByteDance are turning voice capabilities into a basic ability.
So a question is posed to all traditional voice companies: When voice becomes a standard feature of large models, what value do they still have?
Today, let's talk about SmartVoice and the future of traditional voice AI companies.
High gross margin, but still hard to make a profit
From the perspective of revenue structure, SmartVoice mainly has three major businesses: in - vehicle business, smart office, and smart IoT.
Among them, the in - vehicle business is the company's core business. In 2025, the revenue of this business reached 276 million yuan, accounting for 40.08% of the total revenue.
The so - called in - vehicle business essentially provides voice interaction solutions for automobile manufacturers. Currently, SmartVoice has entered the supply chains of many automobile companies such as BYD, Mercedes - Benz, and Volkswagen, and its market share of in - vehicle voice installation volume has reached 22%.
The second business is smart office, including software services such as voice transcription, meeting recording, and free conversation, as well as hardware products such as intelligent ceiling microphones and AI office notebooks. From 2023 to 2025, the revenue of this business increased from 180 million yuan to 243 million yuan, making it one of the fastest - growing sectors in recent years.
In contrast, the smart IoT business has shrunk. From 2023 to 2025, the revenue of this business decreased from 197 million yuan to 169 million yuan, and its proportion of the total revenue also decreased from 36.63% to 24.51%.
From the perspective of profitability, SmartVoice's gross margin is not low.
As the proportion of software revenue increases, the company's gross margin increased from 53.69% in 2023 to 63.24% in 2025.
However, the high gross margin has not been converted into profit.
In the past three years, the company suffered losses of 136 million yuan, 158 million yuan, and 80 million yuan respectively. During the same period, the period expense ratios were as high as 76.3%, 79.5%, and 68.7%.
Behind this is related to the long - standing commercialization dilemma faced by the domestic software service industry.
Most of SmartVoice's businesses still have a strong project - based nature. Whether it is in - vehicle voice, smart office, or IoT solutions, each new customer often comes with additional costs for R & D, adaptation, testing, deployment, and maintenance.
Especially in the in - vehicle scenario, there are obvious differences between different automobile companies, different models, and even different operating systems, making it difficult to achieve large - scale replication like standardized software.
However, this is not SmartVoice's biggest problem. The real problem is, when the multimodal capabilities of general models are strong enough, where is the value of voice suppliers?
Large models are devouring AI voice companies
Since last year, the prices of US software stocks have started to plummet.
One of the most affected sectors is traditional voice service providers.
Since 2025, SoundHound AI has fallen from its annual high of $22.17 to about $8.56, a decline of 61.39%; Cerence has dropped from its high of $27.5 to $11.87, a decline of about 56.84%; Agora has fallen from its high of $6.99 to $4.25, a decline of nearly 39.20%.
Behind the falling stock prices, an increasingly obvious consensus is emerging: Voice technology itself is losing its independent value.
Over the past two decades, the voice industry has been built on a relatively clear industrial chain.
The standard link of traditional voice AI is a typical modular pipeline: ASR (Automatic Speech Recognition), NLU (Natural Language Understanding), Dialog Manager, TTS (Text - to - Speech), and different scenario requirements.
In the past, many voice AI companies were valuable because each layer was difficult. For example, identifying accents, resisting noise, low latency, wake - up words, in - vehicle sound fields, compressed audio quality of telephone lines, multi - person interruption, and the naturalness of speech synthesis all require long - term engineering accumulation.
SmartVoice, YunZhiSheng, SoundHound, and Cerence are all beneficiaries of this era.
However, after the emergence of large models, this logic has begun to change. On the one hand, the improvement of model intelligence has brought stronger multimodal capabilities. On the other hand, large models have also integrated these originally scattered modules into a unified system.
At present, the voice capabilities of large models are rapidly catching up with and even surpassing those of traditional voice manufacturers.
In the past, the most core indicator in the voice industry was WER (Word Error Rate), which refers to how many words are misrecognized in every 100 words. The lower the WER, the higher the recognition accuracy.
Traditional voice systems can usually control the WER within 5% in an ideal environment. However, once they enter complex scenarios such as in - vehicle noise, telephone lines, and multi - person conversations, the error rate often rises significantly.
For example, in 2025, the WER of SmartVoice's solution was about 4.8% in relatively clean scenarios such as news broadcasts, but it rose to 12.3% in an in - vehicle noise environment.
In contrast, OpenAI's open - source Whisper Large - v3 not only achieved a lower error rate on the standard test set but also showed strong stability in real - world scenarios such as meetings, phone calls, and multi - person discussions.
The reason behind this is not complicated.
Traditional voice companies have long relied on high - quality labeled data. Although this type of data is accurate, the acquisition cost is high and the scale is limited. The total scale of industry corpora accumulated by many enterprises over more than a decade is only a few thousand to tens of thousands of hours.
Large models can be trained using public videos, podcasts, phone recordings, meeting records, subtitle data, and user feedback. Taking Whisper as an example, its training data scale reaches about 680,000 hours, far exceeding that of traditional voice systems.
A larger data scale not only allows the model to be exposed to more complex real - world scenarios but also enables it to have stronger context understanding capabilities.
In the past, voice systems were more like identifying keywords, while large models can understand what users really want to express in combination with the context. Even if there are pauses, slips of the tongue, or incomplete expressions, they can correct and complete the content through the context.
In other words, traditional voice models grew up in the laboratory, while large models grew up in the real world.
This change is quickly spreading to the industrial level and bringing up a question:
If OpenAI, Google, Amazon, ByteDance, and Alibaba can all provide low - latency and high - accuracy voice interaction capabilities, then customers will naturally ask: Why do they still need to purchase a voice supplier separately?
To some extent, voice capabilities themselves are becoming more and more like an infrastructure rather than an independent product.
This trend has already begun to emerge.
In 2023, OpenAI reached a cooperation with Mercedes - Benz to integrate ChatGPT into its MBUX in - vehicle voice system. Google has also started to completely replace its original Google Assistant with Gemini and gradually integrate it into terminals such as Android phones, Google TV, and smart watches.
The same is true in China. Doubao has entered Tesla's in - vehicle system in the Chinese market, and Tongyi Qianwen has gradually taken over the voice capabilities behind Tmall Genie and extended them to smart home terminals.
These changes have also brought a more serious problem to SmartVoice:
When voice gradually changes from an independent product to a basic ability, what value do traditional voice AI companies still have?
This article is from the WeChat official account “Silicon - based Observation Pro”. Author: Yuanyuan. Republished by 36Kr with permission.