A Chinese large model research team makes it onto the cover of Nature. LIU Zhiyuan drops a bombshell: Looking forward to "creating AI with AI" next year.
Over the past half - century, the capital expenditure and innovation rhythm of the global technology industry have been closely linked to a rule, namely Moore's Law - the performance of chips doubles every 18 months.
Beyond Moore's Law, there is also the "Andy - Bill Law". It states that the dividends of hardware performance improvement driven by Moore's Law will be quickly offset by the increase in software complexity. Andy refers to Andy Grove, the former CEO of Intel, while Bill refers to Bill Gates, the founder of Microsoft.
This upward spiral of "hardware supply and software consumption" has driven the industrial evolution in the PC and Internet eras.
As time passes and the world changes, both Andy and Bill have stepped down from the front - line of the industry. However, the underlying logic of the rule remains unchanged and has been pushed to new heights by the new "Andy and Bill".
The emergence of ChatGPT has opened the era of generative artificial intelligence. Under the dominance of the Scaling Law, the model parameters are expanding exponentially. The demand for computing power from software far exceeds the supply speed of Moore's Law, and the marginal cost of AI development has risen sharply.
When hardware supply hits the ceilings of energy, data, etc., the old "Andy - Bill" growth paradigm begins to fail.
The industry needs a reverse revolution. As the "software" in the AI era, large - scale models need to reconstruct through extreme algorithms and engineering to unleash stronger capabilities on existing hardware.
In 2025, Chinese large - scale model companies became the most determined practitioners of this path.
From DeepSeek V3, which uses a fine - grained Mixture of Experts (MoE) architecture to match top - tier models with 1/10 of the computing power cost, to the breakthroughs of teams like Kimi in the sparse attention mechanism, Chinese large - scale model companies, known as the "Oriental Power", are trying to narrow the existing computing power gap through architectural innovation.
Liu Zhiyuan, an associate professor at the Department of Computer Science at Tsinghua University, and his team at Mianbi Intelligence, which he co - founded, are also typical representatives. Their released MiniCPM ("Little Cannon") series of models can achieve an intelligence level comparable to that of cloud - based large - scale models with only about 1/10 of the parameter scale, becoming a case of efficient AI at the edge.
In November 2025, the research of Liu Zhiyuan's team was featured on the cover of the world's top academic journal Nature Machine Intelligence, officially proposing the "Densing Law" for large - scale models.
Based on a rigorous back - test of 51 mainstream large - scale models, the paper reveals an astonishing non - linear evolution rule: From 2023 to 2025, the intelligence density of large - scale models soared at a rate of doubling every 3.5 months.
This is an evolution curve that is five times steeper than Moore's Law. This means that every 100 days, half the number of parameters can achieve performance comparable to that of the current optimal model. The cost is halved every 100 days, and it may drop to one - tenth of the original after one year.
Such a rapid iteration speed poses unprecedented challenges to technological innovation and industrial implementation. In a conversation with Tencent Technology, Liu Zhiyuan said bluntly: If a large - scale model company cannot recoup its costs within "3 to 6 months" after releasing a new model, its business model will not be sustainable, because later entrants can soon achieve the same capabilities with one - quarter of the resources.
When the R & D iteration cycle is compressed to the order of a hundred days, the supply of human intelligence is approaching its limit, and the ultimate form of the industry is bound to undergo a qualitative change. The sign of the industrial revolution was machine - made machines, and what Liu Zhiyuan expects as the productivity sign in the AI era will be "using AI to create AI".
Only in this way can we support this intelligent storm that surpasses Moore's Law.
Tencent Technology: Today's topic is your team's latest paper on the "Densing Law" of large - scale models published in Nature Machine Intelligence. Can you introduce the background of this research?
Liu Zhiyuan: Although this paper was published in 2025, the idea germinated as early as the first half of 2024. At the beginning of 2023, the emergence of ChatGPT triggered a global pursuit of large - scale models, and Chinese teams were no exception. At that time, everyone was researching how to reproduce ChatGPT. By the second half of 2023, front - line teams had basically completed the reproduction work.
At that time, we began to think about the future development path of large - scale models. Some teams might continue to follow ChatGPT's technical route, training models at the GPT - 4 level by increasing the parameter scale and investing more data. Although this route has high certainty, it means spending more funds, which is obviously not a sustainable development path. You can't infinitely increase costs to exchange for stronger capabilities.
Therefore, we began to explore how to achieve model capabilities with lower costs and higher quality.
At the beginning of 2024, our Mini CPM series of models verified this: we could achieve capabilities that historically required several times or even dozens of times the number of parameters with fewer parameters. This was an empirical result, and we wanted to find the law behind it, which led to the exploration of the "Densing Law" in 2024.
Figure: The paper on the Densing Law was featured on the cover of Nature Machine Intelligence
Tencent Technology: Does this research make us pay more attention to the efficiency of large - scale models because of China's national conditions? Is it unique at home and abroad?
Liu Zhiyuan: The pursuit of efficiency is indeed due to China's limited computing power. We must focus on how to achieve higher - quality models with less computing power. This is also why in the second half of 2024, a cover article in The Economist mentioned that Chinese companies were bypassing the "computing power wall" through technological innovation, citing examples of Mianbi Intelligence and DeepSeek.
However, the pursuit of efficiency also conforms to the development law of artificial intelligence itself. Artificial intelligence is a technological wave comparable to the industrial revolution. If everyone is to benefit, the technology cannot be expensive. We must, like any previous technological revolution in history, achieve higher - quality products and services with lower costs.
Therefore, we are confident that the Densing Law is of great significance for the future development of artificial intelligence.
Tencent Technology: In the "Densing Law", a key concept is to quantify "intelligence", but this is a difficult problem in itself. Before starting the research, why did you think this was feasible?
Liu Zhiyuan: This is a very good question. In fact, in the paper on the Densing Law, we didn't really solve the scientific problem of "how to measure the total amount of intelligence". Instead, we found a clever way: find a reference model (Reference Model).
We assume that models trained with the same set of technical solutions have roughly the same density, regardless of their size. We use the model trained by this set of solutions as the Reference Model and assume its density is 1. Then, when the target model reaches a certain intelligence level, we observe how many parameters the Reference Model needs to reach the same level. By comparing the number of parameters required for both to achieve the same capabilities, we can calculate the relative density of the target model. This method avoids the difficult problem of directly calculating the total amount of intelligence inside the model.
Of course, how to measure the total amount of intelligence (Mass) is a basic scientific problem that artificial intelligence needs to solve in the next few years. Behind any major technological revolution in history, there has been scientific theory support, such as information theory for communication and thermodynamics for steam engines. In the future, intelligent science also needs to solve the problem of how to measure the total amount of intelligence.
Tencent Technology: During the WAIC in 2024, you mentioned that the cycle of the model's "Densing Law" was 8 months, but the final result in the paper was 3.5 months. Why is the evolution speed much faster than you expected?
Liu Zhiyuan: When we first had this idea in the middle of 2024, the research was still in its early stages. The time span of observation and the number of models were limited, so the data at that time was not stable. The version we released in the second half of 2024 calculated it to be 3.3 months. By the time of the official publication this year, we added data on new models from 2025, and the cycle was revised to 3.5 months.
Actually, whether the specific cycle is three months or eight months is not the most important thing. The most important thing is that this speed is far faster than the 18 - month cycle of Moore's Law. This means that we are facing an intelligent revolution at an unprecedented speed. The cost is halved every 100 days, and it may drop to one - tenth of the original after one year.
At the same time, we did observe an accelerating phenomenon. Before 2023, this cycle was close to five months; after 2023, it was shortened to more than three months. We guess that this is because ChatGPT attracted global attention, and more resources and talents were invested, accelerating technological innovation.
So, the "Densing Law" is not a natural law but a kind of "self - fulfillment" of human society in this technological field: the more we invest, the faster the density grows.
Tencent Technology: Just now you mentioned investment. There is the Scaling Law with a brute - force aesthetic for large - scale models. Do you think the Densing Law and the Scaling Law are unified or contradictory?
Liu Zhiyuan: I think they are two sides of the same coin, complementary to each other. The appearance of the "Scaling Law" is that the larger the model, the stronger the ability. Behind it is that we have found a general intelligent construction scheme (Transformer architecture + sequence prediction learning), making it possible to continuously increase intelligence within a model. It has opened the way to general artificial intelligence. In the coordinate system, the "Scaling Law" is a continuously rising curve where the larger the parameter scale, the stronger the model's ability.
The "Densing Law" tells us that through continuous technological innovation in model architecture, data governance, learning methods, etc., we can carry more intelligence with fewer parameters, thus finding a more "steep" "Scaling Law" curve. That is, achieving stronger capabilities with the same number of parameters or the same capabilities with fewer parameters. So, without the "Scaling Law", there would be no "Densing Law". Both are crucial laws in the development of artificial intelligence.
Tencent Technology: The "Scaling Law" seems to be facing ceilings in terms of data, computing power, and energy. When will the Densing Law encounter a bottleneck?
Liu Zhiyuan: The continuous development of the Scaling Law is indeed restricted by electricity, computing power, data, etc. The Densing Law is a way to achieve a more sustainable Scaling Law. Through technological innovation to increase density, we can continuously improve model capabilities while keeping computing power or cost basically unchanged.
For example, DeepSeek V3 claims to achieve the same capabilities with 1/10 of the computing power, and the continuous decline in the API price of OpenAI reflects that they are using smaller models to provide the same services through internal technological innovation.
Of course, the problem of data exhaustion may need to rely on another technology - large - scale reinforcement learning. That is, let the model generate high - quality data for learning through self - exploration.
Tencent Technology: In 2025, what technological breakthroughs impressed you and made the Densing Law steeper?
Liu Zhiyuan: This year is a big year for model architecture innovation, mainly in three directions:
First, the fine - grained Mixture of Experts (MoE) architecture represented by DeepSeek V3 has matured. By sparsely activating a small number of experts, it significantly improves computational efficiency.
Second, the Sparse Attention mechanism has become popular. By reducing the content participation in attention calculation, it can effectively process long sequences. These two respectively optimize the FFN layer and the Attention layer of the Transformer, achieving "on - demand allocation" of computing.
Third, the idea of the Recurrent Neural Network (RNN) has been revived. By combining with the Transformer architecture, it uses its "memory" mechanism to reduce computational complexity. These innovations are all indirectly improving the model density.
In addition, the application of large - scale reinforcement learning has also made great leaps, especially in the fields of mathematics and code. The model continuously improves its capabilities through self - exploration, and there is no end in sight for now. This solves the problem of data exhaustion.
Tencent Technology: Do you think the Densing Law can be extended to multimodal models or world models?
Liu Zhiyuan: I think this is a general law. Although the doubling cycles in different fields may be different, as long as the models are general and follow the Scaling Law, they will definitely follow the Densing Law in the future. Just like Moore's Law for chips and the improvement of battery density, technological innovation always pursues achieving higher performance with fewer resources.
Tencent Technology: How do you view Google's newly released Gemini 3? Can it be called a milestone - like breakthrough?
Liu Zhiyuan: Internally, we think Gemini 3 is a very important milestone. It has achieved an unprecedented level of text control in image generation, indicating that the controllability of the model and its understanding of the world have reached a new level.
We speculate that it not only relies on the Diffusion model but may also integrate the idea of auto - regression (Auto - regressive) into it, achieving layer - by - layer refinement and high consistency in the generation process. Historically, all text - to - image models have had difficulty handling text content well. In my opinion, the breakthrough of Gemini 3 is a new paradigm worthy of great attention.
This also confirms the Densing Law: as long as a certain level of intelligence can be achieved, it will definitely be able to run on smaller terminals in the future. For example, the current capabilities of Gemini 3 will definitely be able to run on mobile phones, PCs, or car chips in the future.
Tencent Technology: There hasn't been a terminal - side AI device that can replace smartphones yet. Is it because the Densing Law hasn't evolved enough?
Liu Zhiyuan: The development of terminal - side devices is restricted by multiple factors.
First, there aren't good terminal - side application scenarios yet. Although current mobile phone assistants have many users, they aren't closely integrated with the hardware.
Second, the terminal - side technology ecosystem hasn't been formed yet. The development of AGI hasn't converged, the model's capabilities are still improving, and product design can't completely avoid errors. Just like early search engines needed product refinement to become popular, the combination of AGI and intelligent terminals also needs a process. Once the product form matures, the wide application of intelligent terminals will be possible.
Tencent Technology: You mentioned that MiniCPM 4 can be regarded as a kind of "model process". How to understand this?
Liu Zhiyuan: I prefer to compare this generation