MaaS revenue increased 15-fold in five months, and Alibaba Cloud has found a way to expand its Token business.
Text by | Deng Yongyi
Edited by | Zhang Yuxin
“The Token revenue of Alibaba Cloud's MaaS business increased 15-fold in the past five months of 2026, and the monthly Token revenue has reached the level of hundreds of millions of yuan.” These are the latest figures provided by Alibaba Cloud at the press conference on May 20th. The most direct reason for driving this revenue points to one word: Agent.
On this day, Alibaba Cloud released a series of products, including the new-generation flagship model Qwen 3.7 Max. It has only been a month since the release of Qwen 3.6 Max.
Why the hurry? With the popularity of OpenClaw in February this year, model manufacturers have started to improve their models' coding capabilities for Agents. The competition for the coding capabilities of large models is intensifying, and Alibaba also needs to present a model that can compete in coding to maintain the competitiveness of its MaaS business. “The future belongs to the era of Agent Cloud,” said Liu Weiguang, President of Alibaba Cloud's Public Cloud Division.
Coincidentally, on the other side of the ocean in Silicon Valley, the Google I/O Conference was also held on the same day, with the theme of Agent Cloud. Almost all of Google's newly released products, including chips, models, and applications, are centered around Agents.
Agent Coding has become the current global AI consensus.
△Image source: Alibaba Cloud
Go all out for Agents
The first thing presented at the press conference was the new website of Qianwen Cloud, a webpage designed for Agents. This is also the first independent official website for a single business since Alibaba Cloud was founded 17 years ago.
△Image source: Qianwen Cloud
“Qianwen Cloud is designed for Agents, not humans,” said Liu Weiguang, Head of Alibaba Cloud's Public Cloud Division. This comes from a judgment within Alibaba Cloud at the end of 2024: the main users of cloud computing products in the future will gradually shift from human engineers to Agents.
In the past, when a developer or enterprise wanted to deploy services on the cloud, they had to open the official website, register, and then face hundreds of product categories, select the machine type, configure the network, open instances, install the environment, and adjust the API by themselves. Each step required human engineers to make judgments, and the entry threshold was not low.
However, behind the launch of the Qianwen Cloud website, the process has changed: Agents will first look for models, then tools and skills, and finally the underlying cloud resources. The order is reversed.
For example, after the launch of Longxia, Alibaba Cloud found that Agents could automatically complete the activation of cloud computing resources within a day, which used to take human engineers two weeks to do. “In the future, there will be no need for humans to activate the cloud computing resources. Agents will automatically activate them in the background,” said Liu Weiguang.
The official website is just an appetizer. Alibaba Cloud has re-adapted everything from the upper-layer models to the Infra and the underlying chips around Agents.
First, the new-generation flagship model Qwen 3.7 Max was launched, only a month after the release of Qwen 3.6 Max.
Although Alibaba has established a good influence and reputation in the open-source field, compared with Zhipu GLM and Kimi in China, Alibaba's flagship model has not enjoyed the greatest benefits brought by Longxia.
The release of Qwen 3.7 Max is an attempt by Alibaba to gain an edge in the coding field.
In the most authoritative code ability test in the industry at present, Qwen 3.7 Max has caught up with the strongest version of DeepSeek. In the more difficult complex engineering task test, Qwen 3.7 Max also ranked first.
△Qwen3.7-Max can independently execute long-term complex tasks lasting up to 35 hours, and the number of tool calls also ranks among the top among major models. Image source: Alibaba Cloud
Compared with the previous generation, Qwen 3.6 Max Preview, the biggest upgrade of Qwen 3.7 Max is the significant enhancement of the long-term task ability. Agents can independently execute complex tasks spanning dozens of hours and thousands of steps without human intervention.
The stronger the long-term task ability, the higher the complexity of tasks that Agents can complete independently and the less human intervention is required. This is also the core competitive dimension of the current strongest Agent products such as Claude Code and Gemini Deep Research.
Zhou Jingren, CTO of Alibaba Cloud, gave an example: on the new chip platform of T-Head, Qwen3.7-Max achieved self-evolution of the platform's key kernel through independent programming and more than 1,000 tool calls, and the inference speed increased by 10 times compared with the original version.
This means that the model can solve code defects independently like a mature engineer and help engineers develop complex functions.
The completion of these tasks also depends on the adaptation of chips and Infra. At the chip level, Alibaba Cloud's new-generation training and inference integrated AI chip Zhenwu M890 and the self-developed interconnection chip ICN Switch 1.0 are both installed on the super-node servers, targeting large-scale concurrent scenarios of Agents.
Now, the shipment of Alibaba's T-Head PPU chips has exceeded 540,000, and they have started to provide inference services in AI applications such as Wukong and Miaowu.
How to expand the Token business
The explosion of Agents has led to an explosion in Token consumption. An Agent is essentially code generation, and the amount of Tokens consumed in a single task is ten or even a hundred times that of an ordinary dialogue scenario.
Therefore, the consensus on Agents has directly evolved into an open war in the model market: the model that is called more frequently in the Agent scenario will quickly generate revenue. The biggest winner in the current market is undoubtedly Anthropic. According to the Wall Street Journal, Anthropic's revenue is expected to more than double in the second quarter, reaching $10.9 billion.
△Image source: Wall Street Journal
Alibaba Cloud has also benefited from this. In 2025, Alibaba Cloud's annual revenue exceeded 146.6 billion yuan, with a revenue growth rate of 28.6% that year, mainly due to the contribution of AI products.
Wu Yongming, CEO of Alibaba, revealed in last week's earnings conference call that in the quarter of June, the annual recurring revenue (ARR) of AI models and application services, including the Bailian MaaS platform, will exceed 10 billion yuan and reach 30 billion yuan by the end of the year.
However, in this Token war, Alibaba and ByteDance have chosen two different strategies.
“The revenue from Tokens mainly comes from two sources: large language models represented by coding and video models. However, in the past, many people confused the Token increments in the two markets, which is inappropriate,” emphasized Liu Weiguang.
ByteDance has dominated the video model market. According to research institutions, after the popularity of Seedance 2.0, ByteDance's daily Token consumption in the video model market accounts for 80% of the entire market. At the end of 2025, Huoshan set a goal to achieve MaaS service revenue of over 10 billion yuan in 2026. After the popularity of Seedance 2.0, this goal has been raised again.
In contrast, Alibaba Cloud has an advantage in large language models. “Companies with developers need the cloud, so almost all of Alibaba Cloud's existing customers (who definitely have developers) are potential users of coding,” said Liu Weiguang.
At the end of 2025, Alibaba Cloud set a business goal of “capturing 80% of the incremental market share in the AI cloud market in 2026.” Alibaba Cloud has focused its current business efforts on the coding field. “In the first five months of this year, we can say that Alibaba Cloud has captured 80% of the incremental market share in the LLM model market.”
To achieve this goal, Alibaba Cloud has also changed the way it evaluates sales performance. It's not about who sells the most Tokens, but who sells the most valuable Tokens.
Simply put, Alibaba Cloud is not pursuing the Token consumption generated by simple chats because the prices of such models have reached rock bottom.
Instead, a core indicator for Alibaba Cloud now is the number of customers' core business systems that are connected to the model. This means that Alibaba Cloud hopes that the Tokens it sells are used by customers for writing code, making decisions, and running processes. Once they enter the core production process of an enterprise, the Token consumption will increase exponentially, with a higher unit price, more stable repurchase, and corresponding higher-quality revenue.
This is because the Token consumption logic of coding is different from that of video. The Token consumption of video models is one-time: once a video is generated, the task is completed.
The code scenario is more like a self-evolving process: the model writes code, the code becomes an application, the application is deployed on the cloud, and when the application runs, it needs to call the model again, and the model generates more code.
The current competition among large models is a complete system engineering competition. The coupling of chips, Infra, and large models has become the most important factor affecting the efficiency of model training and inference service provision. Business competition is also accelerating, quickly validating the value of scenarios and feeding intelligence back to the models.
“Chips, models, and the cloud are now like gears meshing together and rising in a spiral,” said Liu Weiguang. “If in the future, we can make each chip generate more and higher-quality Tokens than our competitors, then we will win.”
The following is more sharing from Liu Weiguang about Alibaba Cloud, the Agent trend, and the Token war, edited by “Intelligent Emergence”:
1. The ceiling of cloud computing has been raised again by Agents
In the era of cloud computing, our business model was relatively simple, but there was a long-term pain point: when calculating customers' IT budgets, we couldn't tap into the budgets for in-house software development and human outsourcing within enterprises. Now, the situation is reversed, and these budgets are exactly what AI coding can fully target.
We have seen that the Token expenditure of Internet companies has reached 15% - 20% of their IT expenditure, while that of traditional enterprises is still below 5%, indicating a large potential. Alibaba Cloud's goal this year is to ensure that Token revenue accounts for no less than 20% of the revenue from each customer.
Take the automotive industry as an example. In the past, our services were limited to migrating ERP to the cloud, then providing intelligent driving computing power, and later enabling large model conversations. Now, we can even handle advertising and marketing. Previously, the largest IT investment in the automotive industry was in ERP, but now it has shifted to AI.
The same is true in the financial sector. In the past, when dealing with brokerage clients, it was difficult for us to discuss business with them as it was a professional field. Now, clients come to us actively because the business scenarios of top investment research, quantitative trading, and private equity must be deeply integrated with large models.
Agents have become the biggest driving force for the model market and even the existing cloud market. Therefore, Tokens and the cloud are naturally bound together at Alibaba Cloud. This is why coding is our most important direction, as it is applicable to almost everything.
2. Agents are naturally a growth flywheel for cloud services
There is a conversion ratio between Tokens and GPUs. From actual data, we found that after the explosion of Agents, the consumption of one GPU card basically drives the consumption of one CPU card. For example, if 100 yuan is spent on GPU inference, it will simultaneously generate 200 yuan of consumption for GPU + CPU cloud resources because the applications generated by Agents need to be deployed, run, and elastically scaled.
This means that if a vendor does not have a powerful CPU cloud, it cannot serve these Agents. This is why we have been talking about the concept of Agent Cloud, as there is a real revenue closed-loop in this process.
3. Alibaba Cloud's sales system needs to change rapidly
We now assign dedicated MaaS salespeople to large customers, who work in conjunction with the original IaaS salespeople. Even if there is some overlap of salespeople for a single customer, we can accept this cost because the most important thing is not to miss any opportunities.
To be honest, after operating in the cloud industry for a long time, everyone has a certain mindset. In the past, when doing cloud business, the situation was clear - we could estimate how much it would cost to move a certain number of offline servers to the cloud for a customer, and the result would not deviate much. However, when it comes to MaaS, the results may far exceed expectations. Moreover, MaaS requires interacting with business departments and CEOs, rather than just IT personnel, which is a challenge in itself.
Interestingly, the more traditional an enterprise is, the more likely it is to embrace AI because AI simplifies some of their work. Now, even livestock farming enterprises are embracing AI on a large scale, which was unimaginable in the past.
4. The Token war should focus on both quantity and quality
The call volume can be increased through simple conversations, but we don't focus on this. We look at three indicators: whether the number of paying Token customers is increasing daily; whether each customer is connecting the model to their core systems to solve real needs; and the efficiency of Agents in independently completing task closed-loops. The consensus in the United States is the same - to complete the most effective tasks with the least number of Tokens, rather than completing more tasks with more Tokens.
Because we pursue high-quality Tokens, MaaS should be a profitable business from the very beginning. Currently, our Bailian platform (Infra) and the model team work together to optimize the inference framework every day.
Currently, the pricing in China is mainly based on quantity, but our ultimate goal is to have customers pay for the results.
Cover source | AI-generated
Welcome to communicate