Exclusive Interview with Tan Dai from Volcano Engine: A High-Quality Model Is the Top Priority for MaaS, and Doubao 2.1 Has Officially Entered the Competitive Arena
Text | Deng Yongyi
Editor | Zhang Yuxin
Tan Dai, President of Volcengine. Source: Volcengine
In the past three years, Tan Dai, the president of Volcengine, has repeated the process of setting MaaS (Model as a Service) revenue targets for the team three times: at the beginning of the year, he sets a very high target, and the team thinks it's too difficult to achieve; by mid - year, "it's actually almost accomplished", and then the target is raised again.
At the beginning of 2026, ByteDance's video model Seedance 2.0 brought Volcengine a beautiful start. As a latecomer in the cloud market, Volcengine rode on the wave of AI and achieved unexpected growth in the industry.
"I told everyone two years ago that MaaS is a big business, but you didn't believe me at that time," Tan Dai told 36Kr.
Since the second half of 2025, the Coding and video models have successively unlocked commercial production scenarios. This makes people realize that at present, the ability boundaries of models are still difficult to predict. The top - tier models are currently the most core growth engine for MaaS services, and there is no other.
At the Volcengine Force Conference on June 23, ByteDance's new - generation flagship model, Doubao Big Model 2.1 Pro, made its debut. This also means that in addition to video generation, ByteDance has finally filled the Coding puzzle on the model side.
Tan Dai's definition of Doubao Big Model 2.1 Pro is that in terms of Coding and Agent capabilities, "it can finally enter the game". On the well - known programming evaluation Terminal Bench, Doubao Big Model 2.1 Pro can basically match Claude Opus 4.7, performing excellently in long - range and complex tasks and reaching the usable threshold.
Source: Volcengine
This is the market that Volcengine cares more about. The improvement of Coding/Agent capabilities means that the model can enter the core production links of more enterprises and individuals. In other words, it can also create more commercial value.
In addition to the flagship model, Volcengine also brought a series of model updates, including the 4K version of Seedance 2.0, the image generation model Seedream 5.0, the Doubao voice generation model 1.0, and Seedance 2.5, which will be released in July.
"The video generation model is actually one of the implementation solutions for the world model, and at present, it is a relatively mature technical solution that can be scaled up on a large scale through unsupervised methods," Tan Dai mentioned. The Seedance model shows a precise restoration and understanding of the physical world, which makes the synthesis of high - quality visual data more feasible and accelerates the research progress in fields such as embodied intelligence and unmanned driving.
In 2024, when we interviewed Tan Dai, he thought that the era of large models had just reached the "brick phone" stage; two years later, it has fast - forwarded to the feature phone stage - large models are no longer toys for a few people, but have truly entered the lives and work of more people.
Currently, Volcengine has reached the leading level in the MaaS market. The latest data shows that compared with the end of 2025, Volcengine's daily Token consumption has increased by 50%, reaching 180 trillion, more than 1500 times the growth compared with two years ago; the number of customers in the "trillion - club" (with cumulative Token consumption reaching the trillion - level) has also doubled, exceeding 200.
Source: Volcengine
Tan Dai said that with the models released this time and the possible models to be released this year, Volcengine's revenue target for this year has been raised.
Behind this is a change in the pricing logic of the models. In 2024, Volcengine was one of the first manufacturers to bring the price of large models down to the "floor price", but at this conference, they no longer mentioned this.
"The reason for the price cut in 2024 was that all models could do was act as Chatbots, and that's what the models were worth," he told 36Kr. But now, the models have been able to enter the core production links.
This also leads to a bigger question: When large models truly enter the core production links of more industries, what changes will AI bring to the cloud industry?
At the end of 2024, someone once asked Tan Dai: If you can make money by selling APIs, why do you still need to do cloud services? The latter was once considered a good business with long - term prospects, but after more than a decade of development, it has become a highly competitive red ocean in China.
In Tan Dai's view, this question itself is not valid. MaaS and cloud services have never been in an opposing relationship - in the future, the cloud is more likely to use Agents to schedule IaaS, PaaS, and SaaS. Traditional cloud services will not disappear but will become part of the AI cloud. "The new workloads built based on models and Agents may be 10 or 20 times larger than those of traditional cloud services."
Tan Dai also refuted the view that "MaaS services have no loyalty." "There was also no stickiness when cloud computing was selling hosts in the early days," he said. "Now people are using AI at a relatively shallow level. Once the models truly enter the core production system of an enterprise, the coupling will be stronger."
Obviously, at present, both Volcengine and other cloud service providers regard AI as the most important, or even the only, growth engine. Tan Dai thinks this is taken for granted: "If you go back to 2012 when ByteDance was founded, would you focus on PC search at that time?"
The question left for Volcengine next is: How can it keep winning in the MaaS market?
Tan Dai doesn't have a complete answer yet, but one thing is certain and also the most difficult, that is: To keep the model leading in the long term.
I. The Model Has Finally Truly Entered the Core Production Links
36Kr: In the past year, Volcengine has grown rapidly. What is the core driving force?
Tan Dai: Essentially, it's because the model has unlocked real production - level scenarios and entered the core production links. The more challenging and valuable the productivity scenarios or links are, the greater the value brought after unlocking.
One main line is video generation. Seedance is the world's first model to truly unlock commercial production scenarios.
Another main line is LLM/Agent. The production - level unlocking was achieved after Claude Opus 4.6 came out last year. Cursor has an analysis: Before Claude Opus 4.6 came out, the proportion of code completion by pressing the Tab key was higher than that of automatic code completion by the Agent. But after that, the situation reversed. This shows that after 4.6, the model's capabilities have been greatly improved and can be truly used in production - level Coding and Agent scenarios.
36Kr: How can you tell that Seedance 2.0 has truly achieved commercial production?
Tan Dai: Before Seedance 2.0 came out, most video models were used to produce UGC and PGC entertainment videos, which were difficult to apply to serious creative scenarios such as movies, TV shows, and advertisements.
We can also see this change from the user's usage volume: Previously, the usage volume of video generation models was higher on weekends than on weekdays, just like many entertainment - oriented C - end products. But after Seedance 2.0 came out, it's not the case. Its weekday load is more than twice that of weekends, indicating that people are really using it for work.
Video generation is also one of the paths to the world model and has great application potential in the real - world industries. Seedance has been implemented in fields such as embodied intelligence, industrial manufacturing, and intelligent driving, providing new tool capabilities for business needs such as data synthesis, scenario simulation, and process demonstration.
36Kr: Before Seedance 2.0 came out, did you expect it to be a big hit within the company?
Tan Dai: It can't be considered a big hit. We originally set an even more aggressive target, but it still seems challenging to achieve now.
36Kr: Why can Seedance 2.0 achieve such good results?
Tan Dai: It reflects our comprehensive capabilities. To do a good job in video generation, you need a relatively good language model as a foundation, and the capabilities of image generation and VLM (Video Understanding Model) also need to be strong enough.
The good performance of Seedance 2.0 can be attributed to the capabilities of Doubao itself. This is an important advantage for us compared to vertical companies that only focus on video models.
Another point is that the content creation field in China is very active globally. China was the first to develop the best video models, which is related to this.
36Kr: Some market voices think that the war in the video generation field is over, and ByteDance has a dominant position. What's your view?
Tan Dai: It's not that stage yet. The penetration rate of AI in video generation is actually still very low.
Now the outside world has paid too much attention to the short - term revenue of Seedance and ignored its technical value. Video generation is a relatively mature technical solution that can be scaled up on a large scale through unsupervised methods. The Seedance model shows a precise restoration and understanding of the physical world, which makes the synthesis of high - quality visual data more feasible and accelerates the research progress in fields such as embodied intelligence and unmanned driving, and it will have great application potential in the real - world industries.
Moreover, if AI really creates value, it's not about replacing the past but making the whole industry bigger.
36Kr: At this Force Conference, you also released a new flagship model, Doubao Big Model 2.1. How do you define this model?
Tan Dai: I think Doubao Big Model 2.1 Pro has reached the usable standard and can be comparable to the level of Claude Opus 4.6, entering the usable threshold for Agents.
Doubao Big Model 2.1 also marks that we have truly entered the game in the Coding field. This is a very important thing, and there are not many domestic players who have truly entered the game.
36Kr: How do you define "usable"?
Tan Dai: There are several characteristics:
First, strong Coding ability. In the digital world, strong Coding ability means you can flexibly call scripts and tools, and the generalization ability is also very strong.
Second, the ability to complete complex general Agent tasks. This means being able to call tools better, having the ability to handle long - range tasks, having a good combination with memory, being able to adapt to various Harnesses and frameworks, and also having good VLM capabilities - many inputs need to be processed visually, such as Computer Use.
Third, the ability to be applied on a large scale. If the model is good but too expensive, it won't work; if the latency is too high, for example, a Throughput of more than 20 milliseconds won't work either; the model also needs to support more services on a large scale.
Doubao Big Model 2.1 performs very well in these aspects. In terms of Coding ability, it can even exceed Claude Opus 4.6. In terms of large - scale application, the task mode just launched on the Doubao App is based on Doubao Big Model 2.1.
36Kr: In the Coding scenario, when do you think Chinese models will truly catch up?
Tan Dai: It will probably be in Q2 this year. Although many models have said before that they want to be on par with others, just saying it is useless. If you really catch up or even surpass, people will pay for you. You can tell whether you've achieved it by looking at the ARR.
36Kr: Compared with video, why is the progress in the domestic Coding scenario generally slower?
Tan Dai: First of all, globally, the competition in LLM is more intense. Second, we started late. Anthropic and OpenAI started much earlier, and they were the first to define and focus on the Coding direction. It's normal that our overall progress is behind theirs as it's originally a very difficult thing.
36Kr: There used to be a separate Coding model, SeedCode. Will you still develop it?
Tan Dai: After the release of Doubao Big Model 2.1, it's no longer available. The Coding and Agent capabilities have been integrated into the main version.
The models are iterating too fast now. We don't want to wait for one or two months to release a new version, so we've launched a new series called Seed Evolving, which will be updated every one or two weeks based on Doubao Big Model 2.1.
36Kr: Is this model mainly targeted at the developer community and optimized in the Coding and Agent directions?
Tan Dai: It's not just for developers. Some enterprises pursue the stable performance of the model, neither wanting surprises nor scares, so they can directly use Doubao Big Model 2.1. But there are also many people who always want to use the latest and smartest version, and Seed Evolving is designed to meet their needs. But it's not a trial - and - error version and will have strict evaluations.
36Kr: Now that you've unlocked the two main production - level scenarios, which do you think is more important at present, LLM or video generation?
Tan Dai: From my perspective, LLM is actually more important as it can create more value. Although currently, Seedance sells more, I hope that LLM will become the major part in the future.
II. To Fully Release the Model's Capabilities, a "Middle Layer" Is Needed
36Kr: It's a market consensus that now is the era when models drive product growth. People think that as long as the model is SOTA enough, it can sell well. So how does Volcengine's value manifest itself?
Tan Dai: There's actually a lot we can do. The stronger the model's capabilities, the greater the responsibility.
For example, Seedance 2.0 became popular before the Spring Festival, but Volcengine's API was launched at least two months later, in April. What were we doing? We were mainly working on copyright protection. Because we think that in addition to doing a good job in model inference, Guardrail (model safeguard) is also very important.
Looking at LLM, its model capabilities are actually fully released through the Harness. Currently, Seedance actually lacks a Harness of its own. Recently, we've been thinking about how to combine with the industry to build this Harness for different models. Now we have a team of FDE (Front - end Deployment Engineers) working with various industries on this matter.
<