StartseiteArtikel

The performance of large models doubles every 100 days. The "density law" proposed by a Tsinghua University team has been published in a Nature sub - journal.

AI前线2025-11-20 16:45
Since 2020, the Scaling Law proposed by OpenAI has led the rapid development of large models.

Since 2020, the Scaling Law proposed by OpenAI has led the rapid development of large models — the larger the model parameters and the scale of training data, the stronger the generated intelligent capabilities. However, entering 2025, this path of continuously increasing training costs is facing severe sustainable development issues. Ilya Sutskever, the former chief scientist of OpenAI, pointed out in a public speech that as the publicly available corpora on the Internet are approaching exhaustion, the pre - training of large models will not be sustainable ("Pre - training as we know it will end"). Therefore, most researchers have begun to explore new development paths for large models.

The research result of Tsinghua University, the "Densing Law" of large models, provides a new perspective. Recently, this result was officially published in the sub - journal of Nature, Nature Machine Intelligence, offering a new dimension for understanding the development law of large models. The Densing Law reveals that the maximum capability density of large language models increases exponentially over time. From February 2023 to April 2025, it approximately doubles every 3.5 months. This means that every 3.5 months, a model with half the number of parameters can achieve the current optimal performance.

Paper link: https://www.nature.com/articles/s42256 - 025 - 01137 - 0

The "Densing Law" Inspired by "Moore's Law"

Looking back at the history of computer development, guided by Moore's Law, the semiconductor industry has continuously improved manufacturing processes and increased the density of chip circuits, achieving a leap from the 27 - ton ENIAC to smartphones weighing hundreds of grams, ultimately bringing about the popularization of computing power and the information revolution. Today, the world has 1.3 billion personal computers, 7 billion smartphones, 18 billion IoT devices, and 200 billion running CPUs. The core of Moore's Law is not to increase the chip size but to improve the circuit density — accommodating more computing units per unit area.

Inspired by this, the research team proposed that the development of large models can also be observed and understood from the perspective of "capability density". Just as the chip industry has achieved the miniaturization and popularization of computing devices by increasing circuit density, large models are also achieving efficient development by increasing capability density.

The Densing Law of Large Models: The Capability Density of Large Models Shows an Exponential Upward Trend over Time

The research team based on a core assumption: Different - sized models with the same manufacturing process and fully trained have the same capability density. On this basis, the research team selected a benchmark model and set its density to 1 as the baseline for measuring the capability density of other models. The capability density of a given target model is defined as the ratio of the number of parameters of the benchmark model with the same capability to the number of parameters of the target model.

Through a systematic analysis of 51 open - source large models released in recent years, the research team discovered an important law. The maximum capability density of large models increases exponentially over time, doubling on average every 3.5 months since 2023. This means that with the coordinated development of "data - computing power - algorithm", the same intelligent level can be achieved with fewer parameters.

Based on the Densing Law, the research team drew several important inferences.

Inference 1: The inference cost of models with the same capability decreases exponentially over time

On the one hand, the Densing Law indicates that the parameters of large models with the same capability are halved every 3.5 months. At the same time, in terms of the optimization of the inference system, Moore's Law drives the continuous enhancement of chip computing power, and algorithmic technologies such as model quantization, speculative sampling, and video memory optimization are also constantly making breakthroughs. With the same inference cost, the size of the models that can be run continues to increase. Empirical data shows that the API price of GPT - 3.5 - level models has decreased by 266.7 times in 20 months, approximately halving every 2.5 months.

Inference 2: The capability density of large models is accelerating

Statistics based on MMLU as the evaluation benchmark show that before the release of ChatGPT, the capability density doubled every 4.8 months, while after the release of ChatGPT, it doubled every 3.2 months, and the speed of density enhancement increased by 50%. This indicates that with the maturity of large - model technology and the prosperity of the open - source ecosystem, the improvement of capability density is accelerating.

Inference 3: Model compression algorithms do not always enhance the capability density of models

The research team compared the capability density of multiple models with their compressed versions and found that except for Gemma - 2 - 9B, the density of other compressed models such as Llama - 3.2 - 3B/1B and Llama - 3.1 - minitron - 4B is lower than that of the original models. Quantization technology also reduces model performance and capability density. This discovery reveals the limitations of current model compression technology: the training of smaller models during the compression process is often insufficient to achieve the optimal density.

Inference 4: Model miniaturization reveals the huge potential of edge - side intelligence

The intersection of the two curves of chip circuit density (Moore's Law) and model capability density (Densing Law) means that edge - side devices will be able to run large models with higher performance. Edge computing and terminal intelligence will experience explosive growth, and the popularization of computing power will move from the cloud to the terminal.

Guided by the theory of the Densing Law, the teams from Tsinghua University and Mianbi Intelligence have continuously promoted the research and development of high - density models and released a series of edge - side high - density models such as Mianbi Xiaogangpao MiniCPM, MiniCPM - V/o, and VoxCPM. Renowned globally for their high - efficiency and low - cost features, they were rated as the most downloaded and popular Chinese large models on Hugging Face in 2024. As of October 2025, the model download volume was close to 15 million times, and the number of GitHub stars was close to 30,000.

This article is from the WeChat public account "AI Frontline". Author: Tsinghua TsinghuaNLP Team. Republished by 36Kr with permission.