Three Controversies of Large Models in 2025: Technology, Price, AGI
At a time when the "Big Model Wall Theory" is constantly being discussed, the industry is more urgently than ever in its pursuit of exploration and innovation.
Within just one month into 2025, global big model players seem to be collectively "chasing performance". Players like OpenAI, Google, DeepSeek, etc. have intensively released a series of new products.
Looking at the entire big model industry, for almost the first time in history, there have emerged large-scale differences and non-consensus within the industry:
1. Application vs Technology - Has the update of the basic model basically stagnated? Has the focus of industry innovation shifted to applications?
2. Price War vs Value War - Should there be a "price war"? How to fight it? Can startups afford it?
3. Single-modal vs Multi-modal - How important is multi-modal for AGI?
At this crossroads, every big model enterprise, voluntarily or forcedly, has chosen its own position.
For example, OpenAI's GPT-o1 attempts to use reinforcement learning to "extend the life" of the Scaling Law, and Google Titans has begun to explore a new model memory architecture; but at the same time, more players have started to shift their attention to application optimization, functional updates, and user retention.
As one of the "Six Big Model Dragons" in China, MiniMax has previously been renowned in the industry for its strong product power. At this point in time, it has also expressed its attitude through open source and a series of updates.
Since January 2025, MiniMax has consecutively released four AI models within ten days, including the basic language big model MiniMax-Text-01 and the visual multi-modal big model MiniMax-VL-01, as well as the video model S2V-01 and the voice model T2A-01. Moreover, the two MiniMax-01 series models are the company's first ever open source.
The founder also directly stated in a recent media interview, "If I could choose again, I should have made it open source from the first day." It is common for a commercial company to shift from open source to closed source, as can be seen from the joke that "OpenAI has become CloseAI". However, the opposite is not common.
From this series of updates by MiniMax, it can be seen that this company is attempting to reverse the market's impression of it as only having strong products through an open source, innovative, and technology-driven approach. The founder stated, "The reason why the technology brand is important is essentially because the biggest driving force in this industry is technological evolution."
At the same time, in the face of the three "non-consensus" issues in the current big model industry, MiniMax is also attempting to provide its own answers through this series of model updates.
Rolling Application vs Rolling Technology
The Industry Once Again Comes to the "Transformer Moment"
Since last year, a significant trend within the big model industry is that the breakthroughs in underlying technologies have started to slow down.
OpenAI's GPT-5 has been repeatedly postponed and has yet to be seen. The three major elements of AI - computing power, algorithms, and data - have all shown varying degrees of development stagnation, and the model capabilities in 2024 seem to have stopped growing.
In contrast, the outbreak of the "Investment Flow Battle" for big model applications has occurred.
According to AppGrowing data, since "Kimi" initiated the "Investment Flow Battle" for domestic big models, the top ten domestic big model products have collectively placed more than 6.25 million advertisements. Converted at market prices, the amount has reached 1.5 billion yuan.
So much so that there is a joke in the industry: "The only ones making money in the big model industry are Bilibili, Douyin, and Xiaohongshu"; "At least the subsidy war for shared bikes benefited users. Now only advertising platforms are making money."
When it comes to the application level, many enterprises have chosen to focus on APP products, customized cooperation projects, and customized small models for government and enterprises; while at the model technology level, both domestically and overseas, the majority have uniformly chosen the relatively safe "benchmarking GPT" approach, fully following OpenAI in the technical path - and when OpenAI seemingly "hit a wall", the entire industry appears to have slowed down.
On January 15, MiniMax released and open-sourced the latest MiniMax-01 series models, including the basic language big model MiniMax-Text-01 and the visual multi-modal big model MiniMax-VL-01.
A 68-page technical paper, "MiniMax-01: Scaling Foundation Models with Lightning Attention", publicly released simultaneously, has almost caused discussions in the entire AI technical circle.
Silicon Valley tech media VentureBeat and AI tech scholars, investors, and creators evaluate the architectural innovation and long text capabilities of the MiniMax-01 series models
In terms of parameters, the total parameter scale of MiniMax-01 reaches 456 billion, and its comprehensive performance is on par with SOTA (State-of-the-Art, the top in the industry) models such as GPT-4o and Claude-3.5-Sonnet on multiple mainstream evaluation sets. It supports an input of 4 million tokens, and the input length is 32 times that of GPT-4o and 20 times that of Claude-3.5-Sonnet.
In the latest results of the evaluation set LongBench V2, MiniMax-Text-01 ranks third in the comprehensive score, second only to OpenAI's o1-preview and humans.
Long Bench V2 ranking, LongBench V2 is a test set for in-depth understanding and reasoning of long context and multi-task in real-world scenarios
If only the model performance is strong, MiniMax-01 would not have attracted such widespread attention among AI researchers.
The remarkable point is that MiniMax has, for the first time in a 456-billion-parameter ultra-large-scale commercial model, introduced a Linear Attention mechanism that is different from the traditional Transformer architecture. With an extremely low computing power cost, it attempts to provide a new solution to the problems plaguing the entire big model industry.
MiniMax-01 has reconstructed the most fundamental and core Transformer architecture of the big model. Based on the traditional solution (the upper part of the figure below), it has introduced Linear Attention, which is equivalent to changing the substance at the "molecular" level.
This is why the open source of MiniMax-01 has attracted so much attention in the AI research circle.
Schematic diagram of the core architecture of MiniMax-01
Linear Attention technology was not first proposed by MiniMax, just as the large language model technology was not first proposed by OpenAI. However, they are the first players to boldly and firmly apply it on a large scale and conduct comprehensive innovations from algorithms to frameworks, ultimately achieving subversive success.
It is this innovation at the most fundamental technical level that enables MiniMax-01 to achieve performance comparable to the industry's SOTA with one-tenth of the computing power cost of GPT-4o, as well as the world's first 4 million-token ultra-long context.
In the end of the technical paper, MiniMax's researchers stated that one-eighth of MiniMax-01 still follows the traditional Transformer technology. Currently, they are researching a more efficient new architecture to completely eliminate the traditional solution and thereby achieve an unlimited context window.
This means that if MiniMax succeeds, big models will no longer be limited by the input length, and humanity will take a big step towards AGI (Artificial General Intelligence).
Just as BERT's emergence brought the big model industry to the "Transformer Moment", to some extent, we may be witnessing the "Second Transformer Moment".
Price War vs Value War
The Computing Power Cost Remains High, "Everyone Is Working for NVIDIA"
If we were to review the development of the big model industry in 2024, there is a keyword that cannot be missed - "Price War".
The battlefield in this area is mainly concentrated in the B-end. More precisely, it is the big model suppliers who provide big model API services to B-end users and charge by volume.
In early May 2024, when the domestic startup DeepSeek (Deep Exploration) released its latest model DeepSeek-V2, it suddenly significantly lowered the API price. Its input price per million tokens is as low as 1 yuan, which is close to one percent of the price of GPT-4 Turbo at that time. Since then, industry players such as ByteDance, Baidu, Alibaba, Tencent, Zhipu AI, and iFLYTEK have fully followed suit, and a vigorous big model price war has thus begun.
In contrast, the computing power price remains high.
Since ChatGPT became popular at the end of 2022, the NVIDIA GPU chips, which were already in short supply, have further soared in price against the backdrop of the global explosion of AI big models. This has driven NVIDIA's market value to break through 3 trillion US dollars, surpassing Apple and becoming the second-largest market value enterprise in the world after Microsoft.
NVIDIA GPUs are not only expensive but also in short supply. In 2023, there was even news that an overseas AI startup used NVIDIA GPUs as collateral to raise 2.3 billion US dollars. Due to the expensive and scarce computing power, even within tech giants, many departments are fiercely competing for the group's GPU computing power allocation - many big model practitioners joke that "everyone is working for NVIDIA".
On one side is the high computing power cost, and on the other side is the fierce price war. The big model manufacturers caught in the middle are in a dilemma.
However, there is not no solution.
The answer seems somewhat clichéd - problems brought about by technology ultimately need to be solved through technology.
Take DeepSeek as an example: Similar to MiniMax, DeepSeek is also a firm "rolling technology" faction. In 2024, after continuous optimization of the technology, the parameter quantity of its V3 model reached 671B, and the training cost was only 5.576 million US dollars. In contrast, the training cost of GPT-3 in 2020 was close to 12 million US dollars, and the training cost of GPT-4 is more than 100 million US dollars.
In fact, the reduction of model training cost is not only related to the model algorithm, but also covers multiple steps in the middle layer of computing power and application, involving the optimization and scheduling of algorithms, architectures, hardware, software, and tool chains, generally referred to as AI Infra (AI Infrastructure). Against the backdrop of the high computing power cost, the primary goal of AI Infra is to optimize the computing power resources and reduce the model deployment cost as much as possible while ensuring performance.
The Linear Attention technology introduced by MiniMax-01 essentially reduces the complexity of matrix input through the algorithm, thereby reducing the computing power cost. At the same time, MiniMax has also introduced a series of technologies such as Data-packing, Linear Attention Sequence Parallelism (LASP+), and Multi-level Padding, conducting comprehensive optimizations from data, algorithm to GPU communication, making its Machine Floating Point Utilization (MFU) on NVIDIA H20 GPU as high as 75%, greatly reducing the training and inference costs of the model. Its input price is only 1 yuan per million tokens, which is one-tenth of that of GPT-4o.
When asked by the media, "What is the technical achievement that MiniMax is more satisfied with in the past year?", the answer of the MiniMax founder is: AI Infra and computing power optimization, as well as multi-modal.
Single-modal vs Multi-modal: How Far Are We from AGI?
Multi-modal may be the area with the least disagreement in the industry, but the competition is the fiercest.
Modal is a computer term that can be understood as the classification of the perceptual communication mode between the computer and humans - such as text, image, sound, video, etc.
Currently, except for a few players who insist on single-modal, most AI enterprises in the market will aim at the multi-modal track. The most basic ones are the text and image tracks, and those with more coverage will include audio, video, 3D modeling, etc.
Take voice as an example. On January 20, MiniMax released the T2A-01 series voice big model, which supports 17 languages. It has now been launched on its Conch Voice product and is open to all users.
You can check out the synthetic effect in the demo video below.
Starting from the 16th second of the video, without looking at the picture, you can probably accurately determine the gender, age, and emotion of the speaker: an elderly person with white hair, a determined and serious woman, an angry teenager, and a naive child. The voice tones have their own sadness