Feng Dagang's Conversation with Tan Dai from Volcano Engine: 1 yuan for 284 pictures, but Doubao doesn't intend to "involve in excessive internal competition".
The year 2024 is drawing to a close, and the large-scale model market remains in the midst of a fierce competition.
On one hand, news of high financing for major startups both at home and abroad is frequent. The AI data analysis company Databricks has even set a new global financing record for AI startups, with a single-round financing target of over 70 billion yuan, surpassing OpenAI.
On the other hand, led by OpenAI's quarterly releases, new global models and products have also ushered in a wave of concentrated updates at the end of the year. Just on December 18 not long ago, Volcano Engine also brought a series of new upgrades to the Doubao Large Model Family at the 2024 Winter FORCE Prime Mover Conference.
For instance, as ByteDance's most powerful large language model, Doubao Pro has witnessed a major version update and upgrade. Its comprehensive task processing ability has increased by 32% compared to May, fully aligning with GPT-4o, and its usage price is only 1/8 of the latter.
The newly launched Doubao Visual Understanding Model not only can accurately identify visual content but also possesses excellent understanding and reasoning capabilities. It can perform complex logical calculations based on image information, complete tasks such as analyzing charts, processing code, and answering subject questions, and has a delicate visual description and creation ability.
Especially at this FORCE Conference, Volcano Engine announced that the input price of the Doubao Visual Understanding Model per thousand tokens is only 3 cents. One yuan can process 284 720P images, which is 85% cheaper than the industry price, causing a stir in the industry.
Tan Dai, President of Volcano Engine, disclosed to 36Kr that, in fact, even at this price, the gross profit of Volcano Engine is actually very considerable. The low price is not a market strategy of "burning money for subsidies", but is achieved through the joint innovation of algorithms, engineering, and hardware, and the common progress from the underlying system architecture to the upper-level applications. Volcano Engine hopes to assist enterprises and developers to use large model technology more conveniently and with a lower threshold, promote the popularization of AI technology, and make the market larger.
According to Volcano Engine data, in the past three months, the invocation volume of the Doubao Large Model in the information processing scenario has increased by 39 times, in the customer service and sales scenario by 16 times, in the hardware assistant scenario by 13 times, and in the AI tool scenario by 9 times. It has a large number of partners in the B-end automotive, finance, and education fields, such as Mercedes-Benz, GAC Group, Huatai Securities, China Merchants Bank, Zhejiang University, and so on.
Since the launch of ChatGPT at the end of November 2022, the fire of large models has been burning globally for more than two full years. Two years later, the market pattern of large models has undergone earth-shattering changes. Especially in the to B field, more and more enterprises have removed the "filter" of large models and instead started to consider from the real landing application scenarios and economic value.
How exactly does Volcano Engine reduce the landing cost of large models through technological innovation? What impacts and opportunities have large models brought to the cloud computing industry? What kind of AI cloud-native architecture will we usher in in the next decade? Focusing on these topics that are of great concern to the industry, during the 2024 Winter FORCE Prime Mover Conference, Tan Dai, President of Volcano Engine, and Feng Dagang, CEO of 36Kr, had an in-depth conversation.
The following is the dialogue record.
01. What is the most important thing in the past year?
Feng Dagang: Today, although the core content of our questions is from us, we also let Doubao come up with a version. I think it's quite interesting. For example, today's first question: We all know that you have a deep attainment and rich experience in the technical field. What opportunity made you switch from a giant like Baidu to ByteDance and take on the responsibility of Volcano Engine to explore the enterprise-level market?
Tan Dai: This experience is very significant. I have been serving as a technical leader at Baidu, and in this process, I have gradually taken on some new roles.
(Shifting from technology to market development) This is not an easy task. As a pure technical person, if you do not have management experience and business background of a large team, then people often think that you should continue to engage in technical work. But from another perspective, although there is no direct experience, it may also lead to thinking and methods that are completely different from the predecessors. Fortunately, I got this opportunity at ByteDance.
Feng Dagang: Do you spend more time on internal management or external competition nowadays?
Tan Dai: I don't think this problem should be viewed separately between internal and external. Or we can understand it in this way: First, we must solve the problem of productivity, and second, we must solve the problem of production relations, and production relations cover both internal and external production relations.
An important responsibility of a manager is to first solve the key problems and second solve the problems that only he can handle. Some things may not be solvable, so there is no need to expend too much energy on them. Instead, focus on the problems that can only be solved at your own level.
Feng Dagang: What is this problem? For example, what is the problem that you must solve in the recent year?
Tan Dai: The energy I have spent this year is mainly on how to do well in the to B of the model, including both internal and external. For internal products, not only do we need to consider how to improve the effect of the model, but also how to reduce the cost of the engineering architecture and improve the applicability of the product; the same is true externally. How to get more people to use it and bring back their feedback to improve the performance of our products. In addition, it is necessary to clarify what kind of service team, formation, and organizational form should be adopted to provide good services to customers and achieve a good connection.
There are many uncertainties in this. The product is still in the 0-1 stage, there are many uncertainties in customer needs, the model capability is also rapidly improving, and after the market, technology, and product are completed, a corresponding organizational structure needs to be constructed to undertake it. These things are the most important to me and are also the problems that only I can solve.
02. Before large models, all B/C-end technologies were fragmented
Feng Dagang: How do you view the current competition between large models in the to B and to C fields?
Tan Dai: Large models are different from all previous technologies. Previously, technologies in the to B and to C fields were fragmented. For example, using Douyin is not the same as using the Volcano Engine personally; shopping on Taobao does not mean using Aliyun. They are completely different. But today, large models in the to B and to C fields still have a high degree of coupling, and the capabilities behind the applications mainly come from the model. Whether the large model technology is for to C or to B, the core is not as distinct as before.
But this also has benefits. Previously, an important point in the to B end is that the decision-makers are separated from the users, and we do not know how the users use it. Many CRM and ERP products are like this. The person who makes the decision is not the person who uses it. They cannot experience it personally but can only understand the usage methods of others by watching PPTs and explaining cases.
However, the large model is different. First, the large model can be experienced completely. Second, the decision-makers and users are connected in many scenarios. Every customer we encounter now, including chairmen and CEOs, my first suggestion is to let them first download the Doubao APP themselves, because the person in charge and the decision-maker have a perception of AI, use it every day, and then they can understand which environments in the enterprise can be optimized through the large model.
I think this is not only a change in the large model technology itself but also will bring a great difference to the entire business model of to B and to C. In the past, the business end had no perception of how to use the cloud. The people who really dealt with the cloud were the operation and maintenance personnel, but the R & D personnel could only see a bunch of numbers and could only perceive when an accident occurred. However, the large model is different. Everyone is first a user and can perceive the pros and cons of the product in daily life; then discuss how to use it as a tool to improve production efficiency.
Feng Dagang: Is the gap between C-end products of different large models significant?
Tan Dai:The gap between C-end products may be even greater. For example, how to write the Prompt and how to optimize the product design interaction, etc., these will have bonus points, and there is also a gap in the effect of the model itself.
I usually do not directly tell everyone how easy to use Doubao is, but just tell them that we have these capabilities. Looking at the parameters alone is actually meaningless, or the parameters are only a very one-sided piece of information. What is your perception after using it? Whether you use other products? How do you feel after comparing your own usage? When we discuss whether a large model is easy to use, a PPT alone cannot deceive your daily feelings. You will have a clear judgment - this product is really easy to use. Many people tell me that they think Doubao is easy to use, and the key is that Doubao's progress speed is very fast, and the (model learning) slope is very high. This is very important.
03. Who is the first?
Feng Dagang: If we discuss who is the "first" in the field of large models, what do you think is the most critical indicator?
Tan Dai: From the perspective of consumption quantity, the number of token consumptions is the most important, which represents how many inferences are being used. The project amount is not a good dimension because there are too many integration items. If the project is more privatized, then it contains many components, such as hardware, software, application development, and human outsourcing. What is the proportion of the large model in this? Different statistical calibers will lead to different conclusions.
Feng Dagang: What do you think is the position of Volcano now?
Tan Dai: Although there is no third-party data, looking at the data announced by everyone in the industry, I think Volcano is in a very advanced position. But everyone's calibers are different. For example, we directly talk about the number of tokens, while some people do not talk about the number of tokens, they only talk about the number of invocations. Currently, the pricing model of all models is charged according to tokens. Although I think this is a more primary business model, I do not deny that this model may exist for a long time, and perhaps eventually it will evolve into a model that is charged according to value rather than tokens.
For example, OpenAI's original subscription fee is $200, and it wants to make a $2000 product because its AI capability has significantly improved, so it can provide you with higher value. This is the evolution of the long-term business model.
Feng Dagang: The value-based charging method you mentioned seems not to have been implemented in Internet giants yet.
Tan Dai: This is the result of the gradual enhancement of the model's capabilities. The model needs to achieve a very complex agent function to be charged according to value. I hope there can be some pilots within 25 years.
Feng Dagang: How to define charging according to value?
Tan Dai: This model has existed in Volcano from the beginning. We adopt an end-to-end approach in many fields to solve more difficult problems for enterprises, help enterprises save funds, and help you make profits. For example, our initial product in Volcano is to provide recommendation services. Through AB testing, I can clearly tell you how much more money my service can help you earn, and then I take a portion of it. We also need to achieve this in the large model. The core of charging according to value is that AI must go deep into the business side, and the large model also has the opportunity to do this in the future.
Feng Dagang: Now many people start to say, for example, Kimi, that new additions are not important, and retention is a more important thing. What do you think retention depends on?
Tan Dai: In the C-end, retention requires ensuring the user experience, and the same is true for the B-end. We are now very concerned about retention. Will users still use it next week after using it this week? Although we do not follow the indicator system of retention rate and activity rate in the C-end, if users do not come in the second week or the second month, it means you have not done a good job.
04. The era of AI cloud-native
Feng Dagang: In this update of Doubao (Winter FORCE Prime Mover Conference), what do you think is the most worthy of attention?
Tan Dai: The new release and upgrade of the Doubao Large Model this time focus on two aspects:
First, as our most powerful language model, Doubao Pro will have a major version upgrade. This version is fully aligned with GPT-4o and can solve more difficult problems. At the same time, we have released the Doubao Visual Understanding Model. Vision is the most important means for humans to understand the world, and the same is true for large models.
In addition to the model itself, the second highlight is the launch of a series of agent development platforms and tools corresponding to the model landing, including the new capabilities of Volcano Ark, such as using large models for multimodal search and recommendation, etc. We also provide more than 100 industry application templates to help enterprises complete these things at low cost.
Next, we will continue to work on a stronger model, lower cost, and easier landing solutions. Now, the proportion of large models in the enterprise IT load is increasing, and the entire enterprise IT architecture has reached a point of change. Initially, we discussed the traditional IT architecture, and then we discussed cloud-native. We believe that with AI now, the industry will move towards AI cloud-native.
Feng Dagang: How to understand this AI cloud-native? What is the difference from cloud-native?
Tan Dai: Although some people mention AI-native, I think "AI cloud-native" is a more accurate expression. Behind AI is the computing power-driven logic, so the consumption of computing power in the cloud will undoubtedly be greater. The elasticity and redundancy in construction brought by cloud-native will be inherited by AI cloud-native. But at the same time, the emergence of AI has brought great changes to computing and data security. Previously, all our computing architectures were optimized for the CPU. Now, a GPU is added outside the CPU alone, and it needs to be reconstructed for the GPU. The traditional Ethernet architecture can no longer meet the new requirements of the computing network and data level. We need to re-build a data flow system with the GPU as the core.
Moreover, at the data level, one of the greatest values brought by the large model is that we can finally easily handle unstructured data. Previously, the first difficulty in the digital transformation was how to convert unstructured data into structured data. This process is very easy to lose a lot of information and cannot be retrieved. The next step, how to extract unstructured data is also a very difficult thing. But with the large model, unstructured data, such as voice and video, can be directly handed over to the large model for processing.
On the other hand, we also need to consider how to uniformly store and