Can Alibaba Cloud regain the returns that Joseph Tsai and Wu Yongming want from ByteDance's Volcengine?
On May 20th, Alibaba Cloud released its new-generation flagship model, Qwen3.7-Max. That evening, Joe Tsai and Wu Yongming issued a letter to shareholders, once again emphasizing the strategic importance of Alibaba Cloud.
Two weeks ago, Volcengine just disclosed a set of figures: According to the IDC report, in the whole year of 2025, Volcengine's share of Token calls in the Chinese enterprise-level MaaS market reached 49.5%, with Alibaba Cloud at 28% and Baidu at 10%.
Alibaba Cloud's model has become stronger. However, on the list of call volume, it still lags behind Volcengine by nearly 20 percentage points.
Just one day before the Qwen3.7-Max press conference, Alibaba Cloud high-profilely released another industry list. According to Omdia's statistics on the overall AI cloud revenue including the entire chain of IaaS, PaaS, and MaaS, Alibaba Cloud ranked first with 35.8%, while Volcengine had 14.8%.
The two lists, two first places, and two narratives are enough to glimpse the intensity of the competition between the two sides. There is still no answer as to who is building a real moat behind these two lists.
01. The game of evaluation is no longer just about benchmarking
At yesterday's press conference, the evaluation list presented by Alibaba Cloud was much longer than before. In addition to general items such as GPQA, mathematics, and code, a series of tests more focused on Agent scenarios appeared, such as SWE-Pro, MCP-Mark, Qwen SVG, Qwen World Bench, Qwenclaw, and ClawEval.
After three years of benchmark testing in the large model industry, MMLU, HumanEval, and Arena have become almost indistinguishable and cannot answer the question of "who is stronger".
So, the new round of competition has shifted to the right of setting questions. The evaluation of Agent tasks is more complex with more variables: the ability of the model itself, the quality of tool interfaces, the way of task decomposition, and the scoring rules. Every link can affect the final result. This gives large companies with R & D capacity enough room for operation. They write the task forms they are good at into the evaluation framework to gain an advantageous position in the new track.
OpenAI has its own Evals, Anthropic has the Claude engineering task set, and Google has the AIME and code competition series. Alibaba Cloud's intensive launch of special tests prefixed with Qwen this time follows the same logic. In February this year, Volcengine's Doubao Large Model 2.0 intensively updated the evaluation dimensions of multimodal capabilities, emphasizing OS Agent and complex instruction compliance, and deliberately avoiding the track where it competes head - on with Qwen.
Each company is grading itself with the questions it is good at. Alibaba can win the questions set by Alibaba, and Volcengine can win the questions set by Volcengine. It is of some reference value, but not much.
The ability gap between flagship models is narrowing rapidly, and the marginal explanatory power of benchmarks is also decreasing synchronously. A model leading by 0.5 points on a certain list may not necessarily translate into a stable advantage in real - world business. Enterprise customers ultimately care not about who gets the first place, but whether the model can continuously complete tasks, correct itself after errors, and whether the cost can be clearly calculated.
The focus of Agent competition is shifting from "answer quality" to "execution reliability". This is what Qwen3.7-Max wants to prove this time, but the figures at the press conference alone are not enough.
02. Developer entry points: Three completely different paths
Alibaba Cloud emphasizes that Qwen3.7-Max has cross - framework generalization ability and specifically supports tools such as Claude Code, OpenClaw, and Qwen Code.
Claude Code is one of Anthropic's fastest - growing product lines this year, with an increasing user base and stickiness.
Alibaba Cloud's approach is to explicitly separate the tool layer and the model layer. Developers continue to use Claude Code, but the underlying call is replaced with Qwen.
Alibaba Cloud's plan is to let Anthropic build the ecosystem and itself make the replacement. Whether this logic can work depends on whether Anthropic is willing to cooperate, and currently, Anthropic is tying Claude and Claude Code closer and closer.
Volcengine takes another path. In March 2026, it officially released ArkClaw, a cloud - based SaaS - version OpenClaw hosting service.
ByteDance's strategy is to spare developers the trouble of setting up local environments, configuring APIs, and installing Python. By opening a membership on the Volcengine Ark console and clicking "Create Now", a cloud - based OpenClaw instance can be up and running within two minutes. ArkClaw is deeply integrated with the Feishu ecosystem, supporting one - click installation in the Feishu app market. Users can directly @ the intelligent agent in the chat window to book meeting rooms, generate documents in batches, and manage multi - dimensional tables.
DeepSeek also took action at the same time. Recently, DeepSeek posted two new positions, Harness Product Manager and Harness R & D Engineer. The job descriptions clearly state: "All work except the model itself falls within the scope of Harness." Relevant personnel will participate in the entire process of the "DeepSeek desktop - side Agent product" and "define DeepSeek's understanding of Harness". Earlier, in March, DeepSeek posted 17 positions in the Agent field, requiring candidates to have "extensively used well - known Agents such as Claude Code, OpenClaw, and Manus". From a large - scale recruitment to the formation of a special team, DeepSeek's transformation from "only focusing on models" to "also developing upper - layer products" is now clear enough.
Alibaba Cloud focuses on the model replacement space in the global developer toolchain. Volcengine embeds Agent capabilities into the most commonly used office IM by Chinese enterprises. DeepSeek directly develops desktop - side Agent products, competing head - on with Claude Code. The target customer groups of the three overlap, but their core strategies are very different.
These three paths all have their own barriers and weaknesses. ArkClaw's advantages lie in its low threshold and the natural penetration of the Feishu ecosystem. Its weakness is that ByteDance's B - side customers are mainly frontier developers and AI start - ups, and its ability to enter heavy - duty enterprise customers in the finance, manufacturing, and government sectors is limited. It is also difficult to package and sell model services with surrounding cloud products such as storage, databases, and security like Alibaba Cloud. Alibaba Cloud's full - stack layout gives it more say among enterprise customers, but it also means a longer sales cycle and more customized delivery. DeepSeek's strength lies in the technical reputation of its model itself, but product, operation, and user retention are not the natural strengths of a model - making company.
Some developers said frankly that they have no intention of migrating even if Alibaba has launched the Bailian platform. "Migration itself is a cost. I will only consider migrating if Qianwen's capabilities are significantly stronger than other models or if it is completely free."
Anthropic has begun to tie the Claude model and Claude Code more closely: more stable project - level context, more refined tool call protocols, and some capabilities that can only be fully activated by Claude in design. Once the tool layer and the model layer are recoupled, even if third - party models can be connected, they will only be "runnable" rather than "user - friendly".
03. A strong model does not equal strong cloud revenue
On the same day when Qwen3.7-Max was released, Joe Tsai, the chairman of Alibaba Group, and Wu Yongming, the CEO, jointly issued a letter to shareholders, with an unusually direct tone: "The AI business has crossed the initial investment stage and officially entered the commercial return cycle." The letter also stated that Alibaba is increasing its investment in full - stack AI capabilities to "build more powerful MaaS products to more efficiently connect models with applications".
This is the capital story that Alibaba Cloud most wants to tell: the stronger the model, the more the cloud business will benefit.
Alibaba's layout covers cloud infrastructure, large models, enterprise customers, e - commerce and office scenarios, chips, and servers, making it the closest to a truly "AI full - stack" among domestic cloud providers. In November last year, Alibaba launched the "Bailian Campaign" with the goal of tripling the Bailian Token call volume in the short term. In March this year, Wu Yongming, the group CEO, directly led the establishment of the Alibaba Token Hub business group. The market generally approves of this direction.
However, growth figures do not mean growth quality.
Back to the two lists at the beginning. According to IDC's statistics on Token call volume, Volcengine accounted for 49.5% in the whole year of 2025, while Alibaba Cloud had 28%. According to Omdia's statistics on the entire - chain AI cloud revenue, Alibaba Cloud had 35.8% and Volcengine had 14.8%. Behind this "dual - first" pattern are two completely different business logics.
Data shows that the revenue from MaaS services billed by Token currently accounts for less than 1% of the entire AI cloud market scale. Although the growth rate of Token call volume is astonishing, it has not become the real major source of revenue. This means that there is still a long way to go between Volcengine's "first in MaaS Token call volume" label and "first in AI cloud revenue". Similarly, Alibaba Cloud's "leading in full - stack AI cloud revenue" cannot be directly translated into the full realization of the value of the Qwen model.
A company may choose Alibaba Cloud's AI services because Qwen has strong capabilities, or because of low prices, compliance requirements, or because it is already in Alibaba Cloud's procurement system. Among these three sources of revenue, only the first one is worth telling a story about. The latter two are just the revenue of traditional cloud business in an AI guise. Currently, Alibaba Cloud has not disclosed, nor can it easily disclose, the proportion of each of these three types of revenue.
The most dangerous form of the AI cloud business is: good - looking growth figures but poor quality. High trial volume but low retention; increasing call volume but with gross profit eaten up by GPU costs; many projects signed, but most are heavy - customized deliveries, making large - scale replication difficult. In this case, no matter how strong Qwen is, it only helps Alibaba Cloud sell more computing power and does not truly build a moat at the model layer.
This problem also exists for Volcengine. Token call volume does not necessarily translate into high - quality enterprise - level revenue. According to media reports, front - line salespeople at Alibaba Cloud said bluntly that since 2024, Volcengine has posed the greatest threat to Alibaba Cloud, and the two sides often compete fiercely for the same customers. But what kind of customers are they competing for? Are they AI start - ups with thin profits and high churn rates, or enterprise customers in the finance and manufacturing sectors with high stickiness and long procurement cycles? The difference in the value of these two types of customers is huge.
Baidu is another reference. It was the first in the AI public cloud service market for many consecutive years but is quietly changing its strategy. This year, Baidu Smart Cloud changed its advertisement at Beijing Airport from "leading in the market share of AI public cloud" to "the winner of the largest number and amount of large - model market projects among Chinese cloud providers". This implies that central and state - owned enterprise customers are Baidu Cloud's comfort zone, and head - on competition in MaaS is no longer its main battlefront.
The characteristics of high - quality AI cloud revenue should be: continuous customer calls, increasing depth of use, rising migration costs over time, decreasing unit costs with scale, and the model's ability to drive the coordinated growth of surrounding cloud services such as storage, databases, security, and data governance.
None of these AI cloud providers has clearly explained this link in their financial reports so far.
04. Who is building a real moat
After the release of Qwen3.7-Max, the outside world will continue to compare it with Claude, GPT, and DeepSeek to see who is stronger, but the importance of the result will decline.
The competition in the Agent era is no longer just a competition of model capabilities. Model capabilities are the foundation, the right of evaluation determines the right to speak, the developer entry point determines the right to call, cloud infrastructure determines the delivery ability, and revenue quality determines the commercial value.
From this perspective, there are roughly four paths in the Chinese AI cloud market.
Volcengine's path is to polish its B - side capabilities with the massive C - side users of the Doubao App, rapidly increase the volume with extremely low prices and low - threshold tools, establish the "first in Token call volume" as a cognitive barrier, and then promote in - depth cooperation with enterprises based on the call volume. The risk of this path is that the customer structure is shallow. Once a competitor with lower prices appears, the switching cost is not high. It is still unclear whether the "Token first" label is a moat or just a traffic bubble.
Alibaba Cloud's path is to compete for technical discourse power with model capabilities, bind enterprise workloads with the Bailian platform and the full - stack product matrix, compete for developers by supporting mainstream tools like Claude Code, and finally form a closed - loop among the model, toolchain, and cloud infrastructure. The risk of this path is that the chain is long and there are many nodes. Any problem in one link will affect the whole, and "full - stack" means maintaining competitiveness in every sub - battlefield, which consumes a huge amount of resources.
Baidu takes the third path, giving up the head - on competition in MaaS and holding on to the government, state - owned, and central - state - owned enterprise markets, building barriers with compliance and security capabilities. This path is the safest but also the least flexible.
DeepSeek chooses to compete head -