Two Years Ago, Tsinghua University's Prediction Becomes Global Consensus, Three Major AI Institutions Like Meta Reach Same Conclusion

The doubling speed of AI capabilities is accelerating.

[Introduction] It's crazy! The AI evolution data recently measured by Meta and METR perfectly aligns with the "Density Law" proposed by a Chinese team two years ago. Silicon Valley suddenly looks back and realizes that Chinese researchers have led the way in this field for two years!

The three most serious AI research institutions globally have had a collective "collision" in the past week!

On April 3rd, the US research institution METR quietly updated a technical report. The core conclusion can be summarized in one sentence.

The AI capabilities double every 88.6 days.

Five days later, on April 8th, Meta's Super Intelligence Laboratory released a new model, Muse Spark, and made public a training efficiency curve called the "scaling ladder" internally. The conclusion is also one sentence.

To catch up with the performance of Llama 4 Maverick from a year ago, the new model only needs less than one-tenth of the training computing power.

One report measures task duration, and the other measures training computing power. The two institutions have no communication, and their research methods have no overlap.

But when the two curves are converted to the same coordinate system, their slopes almost completely overlap.

Up to this point, the situation is already quite absurd.

What's even more absurd is that this curve was completely drawn by a Chinese team two years ago and was published in a sub - journal of Nature.

It's called the Density Law.

Two years ago, someone drew this line in advance

This concept first appeared in a paper titled "Densing Law of LLMs".

The authors are a joint team from Mianbi Intelligence and Tsinghua University, led by Professors Sun Maosong and Liu Zhiyuan. The first author is Ph.D. student Xiao Chaojun.

The paper was posted on arXiv in December 2024 and was accepted by Nature Machine Intelligence in November 2025.

Paper link: https://arxiv.org/abs/2412.04315

Paper link: https://www.nature.com/articles/s42256-025-01137-0

The core judgment of the paper can be summarized in one sentence.

The intelligent density of the model increases exponentially over time. The number of parameters required to reach a specific intelligent level decreases by half every 3.5 months.

In late 2024, this statement sounded a bit extreme.

At that time, the entire industry was in awe of the scaling law. OpenAI was piling up model parameters, Anthropic was doing the same, and so was Meta.

Everyone thought that the larger the parameters, the stronger the intelligence, and burning GPUs to the limit was the right way.

But the research team didn't think so.

They measured all the influential open - source foundation models at that time, from Llama - 1 to Gemma - 2 and MiniCPM - 3, a total of 51 models, using the same standard.

After running five major benchmarks, the result was an almost perfect exponential relationship, with an R² of 0.934.

Considering that large - model evaluations are easily interfered with by data pollution, they re - measured using a newly constructed pollution - filtered dataset, MMLU - CF. The R² was 0.953.

Both fittings achieved an R² close to 1. Statistically, this is almost impossible to be a coincidence.

In other words, every mainstream open - source model released in the past two years, regardless of the team or architecture, falls on the same exponential line of "doubling every 3.5 months".

Up to this point, the story was just "a Chinese team proposed a seemingly radical empirical law".

What really turned this into a "moment" was what happened in the next half - year.

Three institutions, three methods, the same slope

Let's take a look at the conclusions of Mianbi, Meta, and METR.

Mianbi's Density Law measures "how many parameters are needed for the same intelligent level". The conclusion is that the parameter requirement is halved every 3.5 months.
Meta's scaling ladder measures "how much training computing power is needed for the same intelligent level". The conclusion is that Muse Spark saves an order of magnitude in computing power compared to Llama 4 Maverick from a year ago.
METR's time - span report measures "how long a task the same model can handle". The conclusion is that the task duration doubles every 88.6 days.

Three measurement standards. Three academic institutions. Three research paths with no overlap.

But when all the numbers are converted to the same coordinate system, their curve slopes almost completely overlap.

The most easily overlooked point is that the Density Law was the first to be proposed among the three. It was proposed nearly two years earlier than Meta's scaling ladder and more than a year earlier than METR's complete modeling.

When Meta drew that scaling ladder in its blog post in early April, they probably didn't even realize that the shape of this graph is almost the same as the curve on a PPT at an academic conference in Beijing in 2024.

What kind of observation deserves the term "law"

In the scientific community, there is an unwritten standard to determine whether an empirical observation is qualified to be called a "law".

It's not about how beautiful the data is, but about whether it can hold true in multiple independent measurement systems.

Moore's Law is a law because the semiconductor industry has verified it for decades from three completely different dimensions: lithography precision, transistor density, and unit computing power cost.

The Density Law follows the same path.

It was initially just a fitting curve from a single team. By the time it was accepted by a sub - journal of Nature, it could be reproduced on a pollution - filtered dataset. This month, it was independently verified twice in Meta's training data and METR's task evaluations.

In a broader context, this moment is very similar to when electricity first entered New York in the 1880s.

At that time, different inventors, different engineers, and different cities were each working on their own power grids. It wasn't until someone drew the development curves of all the projects on one piece of paper that people realized that this wasn't just a few scattered engineering improvements, but a new era was quietly unfolding.

This time, it only took less than a year from the publication of the paper to its verification by global peers.

Three inferences, each rewriting industry assumptions

If the Density Law holds, it will rewrite many things simultaneously.

First, the inference cost will collapse faster than everyone expected.

One inference of the Density Law is that the inference cost of an LLM with the same performance is halved every 2.6 months.

Now, this decline rate has been exceeded in reality.

The latest tracking data from Epoch AI shows that for an LLM with the performance level of Claude 3.5 Sonnet, the token price has dropped by 400 times in the past year. The fastest decline rate for the same - level performance has reached 900 times per year.

At the end of 2022, GPT - 3.5 was priced at $20 per million tokens. Today, Mistral Nemo only costs $0.02, 1000 times cheaper, and the model is even stronger.

Looking back, the prediction in the paper was still conservative.

Second, the explosion point of edge - side intelligence is closer than everyone thought.

Multiplying the Density Law by Moore's Law will yield a more exciting number.

According to current estimates, the maximum effective model scale that can run on a chip of the same price doubles approximately every 88 days.

This number is almost the same as the 88.6 days calculated by METR. Two completely different calculation paths have "collided" at the decimal point.

In the next three to five years, running a current top - level GPT - level model on an ordinary laptop or even a mobile phone may no longer be science fiction.

Third, the optimal strategy in the large - model industry is quietly reversing.

In the past three years, the industry's understanding of the scaling law has always been "piling up parameters and data".

But the Density Law gives a counter - intuitive judgment. Given the continuous exponential growth of density, the strongest model in any state only has an optimal window period of a few months.

Investing all resources to train a larger model only to be surpassed by a new model half its size in three months is not cost - effective.

The truly sustainable path is to invest resources in improving the density itself. This includes better architectures, higher - quality data, and smarter training algorithms.

Mianbi has been following its own measurement standard

It's worth noting that the Density Law is not just a paper that ends after publication.

Mianbi Intelligence, the company that proposed this theory, has been verifying it with its own "MiniCPM" series of models in the past two years.

When MiniCPM - 1 - 2.4B was released in February 2024, its performance could match or exceed that of Mistral - 7B in September 2023. That is, in four months, with 35% of the parameters, it achieved the same performance.

This figure was directly written into the paper published in the sub - journal of Nature as the first empirical case of the Density Law.

Since then, the MiniCPM series has been open - sourced, covering four major directions: text, multi - modality, voice, and full - modality for models with parameters below 10B. In China, apart from Alibaba, only Mianbi has achieved such a high level of open - source completeness.

So far, the global open - source download volume of the MiniCPM series has exceeded 24 million times.

It's not the largest model in the industry. But it's the first team in the industry to implement "density first" as the company's methodology.

When Meta and METR verified the Density Law in their own ways in the week of April 2026, this Chinese company, which started training models according to this methodology in 2024, actually had two years of engineering experience ahead.

This time, Chinese researchers are at the starting point of the curve

A theoretical framework proposed by a Chinese research team two years ago is being rediscovered again and again by the most serious overseas institutions such as Meta and METR in their own ways.

It may take some time to fully understand the significance of this.

This is not a story of "we can do it too". It's a story of "we saw it earlier".

Such moments in the history of science are not many. A judgment that was doubted in 2024 has become the same curve pointed to by multiple independent pieces of evidence in 2026.

This kind of "coincidence" across regions, methods, and institutions has happened several times in physics, and each time it marks the end of an old paradigm and the beginning of a new one.

This time, Chinese AI researchers are at that starting point.

And that curve is still rising at a rate of doubling every 88 days.

Reference materials:

The "Density Law" first proposed by Mianbi Intelligence is recognized by top overseas institutions such as Meta

https://arxiv.org/abs/2412.04315

https://www.nature.com/articles/s42