The competition in MaaS is getting fiercer. Why is the market share of Volcengine still increasing?
Text by | Su Tianming
China's MaaS (Model as a Service) market is expanding rapidly, evolving from a very small and niche market into a business growth point full of potential.
The latest data from market research firm IDC shows that in 2025, the call volume of large models in China's enterprise-level MaaS market increased by 16 times year-on-year, reaching 194.1 trillion Tokens, and it is expected to grow even faster in 2026.
In 2025, especially in the second half of the year, almost all of China's cloud computing providers and large model companies entered the market, investing more computing power, sales, and product resources, elevating the priority of MaaS business, and the competition became increasingly fierce.
Normally, in a rapidly expanding emerging market, after latecomers enter the market in large numbers, the share of the leading players is more likely to be diluted. Especially in a market like MaaS, the outside world once believed that it was difficult for large model APIs to form stickiness: developers only seemed to need to change a few lines of code to replace the underlying model or switch cloud platforms.
However, the latest data from IDC presents a counterintuitive result: in 2025, the market share of Volcengine in China's MaaS market was very stable, rising from 49.2% in the first half of the year to 49.5% for the whole year.
That is to say, in the second half of the year when the competition was the fiercest, Volcengine not only was not diluted by latecomers but also continued to expand its leading advantage as the market expanded. Nearly one out of every two large model Tokens generated on China's public cloud runs on Volcengine.
The outside world tends to attribute this to an aggressive pricing strategy. In May 2024, when Volcengine launched the Doubao large model MaaS service, it reduced the price by 99.3% compared to the industry's general level. However, simple subsidies cannot explain the continuous expansion of Volcengine's market share. Other manufacturers in the industry quickly lowered the price of MaaS services to a similar level. What really determines whether low prices can be sustained is the call volume and inference engineering capabilities.
Model capabilities are also crucial. The rapid expansion of the MaaS market mainly comes from the continuous opening of new scenarios after the improvement of model capabilities: the improvement of model programming capabilities has promoted the popularity of Vibe Coding and Agent, and video generation models have entered the production processes of short dramas, comic dramas, and advertisements, continuously increasing Token consumption.
This means that MaaS is more of a speed competition in the incremental market. Whoever can productize model capabilities faster and provide cost-effective and stable services can take on new scenarios more quickly and continue to expand their market share as the market expands.
From the Doubao large language model to the Seedance video generation model, the capabilities of the Doubao series of models have been continuously iterated. On this basis, Volcengine has accelerated the transformation of the accumulated Token volume into a more comprehensive competitiveness: lower inference costs, higher engineering efficiency, and the infrastructure required for Agent operation. A cloud computing flywheel in the era of large models is taking shape.
01 Behind the low price are scale and engineering capabilities
Cloud computing is a typical industry with high fixed costs and low marginal costs. The server, network, R & D, and operation and maintenance systems all require large upfront investments, but the marginal cost of each additional call will decrease. The larger the scale, the easier it is to spread the R & D and infrastructure investments.
Scale also magnifies the value of engineering optimization. Tan Dai, the president of Volcengine, once gave an example: "Optimizing the utilization rate of 10,000 servers by one percentage point and optimizing that of 1 million servers by one percentage point result in a 100-fold difference in benefits. A powerful team can do it better."
Scale is the most important variable that Volcengine focuses on when developing MaaS: it is not simply selling model interfaces but quickly increasing the Token call volume.
To this end, Volcengine uses Token consumption as the core indicator for business development and adjusts the performance assessment method of the sales team: for MaaS products with the same sales volume, the incentive weight in the internal assessment is several times that of traditional cloud services.
Along with the elevation of business priority, Volcengine has also increased its technological investment in the direction of model inference. The cost of MaaS mainly depends on the Token generation efficiency. If the server utilization rate, cache hit rate, and computing power scheduling efficiency are improved, there is a chance to reduce the cost.
"Lower costs can stimulate more applications and expand the market," Tan Dai said later when talking about the pricing strategy at that time. "Seeing that we could reduce the cost through technology, we decided to make a thorough reduction at once."
The key technologies that supported Volcengine's price cut at that time were mainly PD separation and KV Cache, which were applied on a large scale earlier. PD separation separates "understanding the problem" (Prefill) and "generating the answer" (Decode) in large model inference and matches them with more suitable computing power units respectively; KV Cache caches the historical states during the model generation process to avoid repeated calculations of the previous context every time new content is output, thus saving video memory bandwidth and inference costs.
However, these technologies all depend on scale. When the call volume is small, maintaining a complex cache and scheduling system also incurs costs, which may even offset the saved computing power.
As technologies such as PD separation and KV Cache spread in the industry, the Token prices in the industry gradually converge. Followers lacking economies of scale often face greater cost pressure when matching low prices and may even incur losses.
Volcengine, with a larger call volume, has less cost pressure and more room to continue optimizing inference technologies, forming a sustainable low-price ability.
Volcengine is also looking for ways to reduce costs outside of technology and engineering: on the one hand, it offers differential pricing based on the Context length range, giving customers the right to choose; on the other hand, it launches a "savings plan" that combines the usage of customers on different models such as language models and video generation models. The scale discounts accumulated by customers on language models can be used to offset the trial-and-error costs of new businesses such as video generation.
The latest IDC report on China's MaaS market mentions that Volcengine has the highest market share, that is, the call volume share; its revenue share also ranks first, but it is a few percentage points lower than the call volume share. The unit price of each Token of Volcengine is lower than the industry average.
It should be noted that IDC's statistics on China's MaaS market mainly cover the situation of enterprises calling models on public clouds, excluding AI applications such as Doubao and Jimeng developed by ByteDance, as well as Tokens generated when internal businesses such as Douyin and Feishu deploy large models.
These call volumes are not included in IDC's market share statistics, but they will also affect Volcengine's cost structure and engineering efficiency.
02 Agent turns MaaS into an infrastructure business
Sam Altman, the CEO of OpenAI, recently said in an interview that the next stage of AI will shift from "users providing a piece of text and the large model returning a piece of text or code" to "Agents truly running inside companies to complete various types of work." He said that OpenAI is also collaborating with AWS to develop a product similar to a "virtual colleague."
MaaS is evolving from a standardized supply of model interfaces to an enterprise infrastructure with stronger stickiness. For an enterprise Agent to truly operate, it requires components such as identity authentication, permission control, memory systems, tool calls, sandbox environments, log records, and security governance, as well as connections with the enterprise's internal systems.
This is also the core reason why the large model industry has recently started to attach importance to Agent Harness. The so - called Harness originally means "harness" or "rigging." In the context of Agent, it refers to the engineering system that cooperates with the basic model. MaaS provides stable model capabilities, and Harness is responsible for turning inference into a controllable, traceable, and sustainable workflow.
The way cloud platforms provide large model services has also changed accordingly. Whether it is the cooperation between Anthropic and multiple cloud providers or the cooperation between OpenAI and AWS in April this year, it is not just simply putting the model interface on the cloud platform but also encapsulating it into the cloud platform's native Agent environment, enabling enterprises to develop and operate production - level Agents in the cloud platform environment.
Volcengine's product evolution in the past few years can also be understood in this trend: while enhancing the competitiveness of MaaS, it expands large model services into an infrastructure covering the development and operation of Agents.
"We were the first in China to launch a full - set of Agent products and simplify Agent development," Tan Dai said in an interview at the end of last year. "Customers can create a complex Agent by writing a few lines of code, just like developing a complex website before. Now, they just need new AI middleware."
In his view, in the past, writing code was essentially writing if - else statements to define workflows; now, when developing Agents based on models, developers are more likely to write prompts, and the model itself is increasingly responsible for process planning, task decomposition, and creating sub - Agents. This is also the underlying working logic of products like OpenClaw.
Therefore, at the beginning of this year, while supporting the CCTV Spring Festival Gala, Volcengine quickly launched the OpenClaw product ArkClaw, improved its security capabilities, and open - sourced the context database OpenViking designed for long - term Agent memory, making ArkClaw more user - friendly.
They define the "ArkClaw Personal Edition" as a "flexible Agent": first, let employees quickly experiment with ideas to improve business efficiency, and then precipitate and solidify the verified and effective capabilities into a "stable Agent." The latter corresponds to the Agent development and operation platform HiAgent launched by Volcengine in 2024.
By April this year, the number of enterprises on Volcengine that had consumed trillions of Tokens had increased from 100 at the end of last year to 140. More and more large MaaS customers have deepened their cooperation with Volcengine.
03 The AI cloud flywheel starts to spin
In business analysis, the flywheel effect is the core logic to explain the success of AWS, the world's largest cloud computing platform: scale spreads costs, price cuts attract more customers, and customer growth brings more feedback, cash flow, and a stronger ecosystem, driving the continuous iteration of technology and services.
Volcengine is building a similar flywheel in the AI era. However, its flywheel does not completely follow the logic of the traditional cloud computing industry. The flywheel of traditional cloud computing mainly revolves around computing power, storage, network, and software ecosystems; the flywheel of MaaS adds model capabilities, Token usage methods, Agent scenarios, and real - business feedback.
The first layer of Volcengine's flywheel is the cycle among model capabilities, call volume, and inference costs.
Seed, the internal model R & D team of ByteDance, stably supplies first - tier models to Volcengine. The stronger the model, the easier it is to expand the call volume; the larger the call volume, the more engineering technology can be used to reduce costs; after the cost decreases, more customers can be attracted. This is a scale flywheel similar to traditional cloud computing, except that the measurement unit has changed from servers, storage, and bandwidth to Tokens.
The second - layer flywheel comes from the feedback of real scenarios. In the ByteDance ecosystem, hundreds of millions of people use Doubao every day, Jimeng is growing rapidly, and dozens of internal business lines such as Douyin and Feishu, as well as external customers, are all developing and using large model capabilities through Volcengine, providing Volcengine with high - frequency, complex, and real product feedback.
This feedback flows to the Seed model team on one hand to help the basic model continue to iterate; on the other hand, it flows to Volcengine's Agent team to help improve product capabilities.
Agent products especially rely on this feedback. Anthropic also mentioned in several technical articles that the improvement of Agent capabilities does not only depend on the improvement of model capabilities. Internal employees, external users, production monitoring, A/B testing, user research, and customer deployment requirements jointly drive the iteration of products such as Claude Code.
In 2025, Volcengine's nearly half - share in China's MaaS market is just a phased result after its flywheel starts to spin.
Now, the Agent boom continues to drive up market demand, and there has been a shortage of computing power in the industry. Some companies have chosen to raise prices to optimize their short - term financial performance. Volcengine said it will not follow suit.
This pricing restraint comes from Volcengine's judgment of the industry stage: compared with obtaining higher short - term profits, it is more important to expand the call volume, lower the usage threshold, and increase real scenarios to make the flywheel accelerate.
As Tan Dai said, the competition in AI cloud is a marathon, and only one kilometer has been run. The current market share does not represent the final outcome. "Acceleration is more important than speed."