HomeArticle

The "father" of Transformer lambasts: Current AI has hit a dead end, and fine-tuning is just a waste of time!

新智元2026-01-17 15:29
Transformers are not the end of AGI. New architectures may require biological inspiration.

[New Intelligence Yuan Introduction] Transformer was once the pinnacle of the AI revolution, but its inventor, Llion Jones, warns that it is not the end. Just as RNN was replaced, countless current fine - tuning studies may only be local optimizations, and the real AGI breakthrough may lie in a brand - new biologically inspired architecture.

Is Transformer the end of AI?

No, definitely not.

Is scaling the only path to AGI?

The person who has studied the Transformer architecture the longest tells you: No.

Llion Jones, the founder and research scientist of Sakana AI, along with seven other co - authors, invented the Transformer.

Except for those seven co - authors, no one has studied the Transformer longer than him.

Nevertheless, last year, he made an important decision: to significantly reduce his research investment in the Transformer.

It's not because there are no new things in this field, but because it has become extremely crowded.

He said bluntly that he has become a victim of his own success:

I don't think the Transformer is the end, nor do I believe that we just need to continue to scale infinitely.

One day, we will have another breakthrough, and then look back and find that a lot of current research is actually a waste of time.

Transformer May Repeat the Tragedy of RNN

Before the emergence of the Transformer, RNN was the mainstream.

RNN was indeed a major breakthrough in the history of AI.

Suddenly, everyone began to work on improving RNN.

But the results were always minor adjustments to the same architecture, such as changing the position of the gating unit, and improving the performance of language modeling to 1.26 or 1.25 bits per character.

After the emergence of the Transformer, when we applied a very deep decoder - only Transformer to the same task, we immediately reached 1.1 bits/character.

So, all the research on RNN suddenly seemed to be in vain.

And now, papers seem to be back on the old track: making countless tiny changes to the same architecture, such as adjusting the position of the normalization layer or slightly improving the training method.

In 2020, Sarah Hooker, a former researcher at Google DeepMind, proposed the "Hardware Lottery":

There is more than one path to AGI. Deep neural networks just happened to hit the hardware lottery like GPUs.

Paper link: https://hardwarelottery.github.io/

The term "Hardware Lottery" describes that a certain research idea wins because it happens to fit the existing software and hardware conditions, rather than because the idea has universal superiority among all alternative research directions.

Llion Jones believes that the Transformer is an architectural lottery, and the industry may repeat the mistakes of RNN.

Even though some architectures have performed better than the Transformer in papers. The problem is that the new architectures are not good enough for the entire industry to abandon the Transformer.

The reason is very practical: people's understanding of the Transformer is already very mature, and the training methods, fine - tuning methods, and supporting software tools are all available.

It's impossible for people to switch to a new set from scratch unless the new architecture is "overwhelmingly superior".

The Transformer replaced the RNN because the gap was too large to ignore.

The rise of deep learning is the same. People once believed that symbolicism was more reliable until neural networks showed overwhelming advantages in image recognition.

Llion Jones believes that the Transformer is too successful, which has trapped people in a "trap":

It's like a huge "gravity well", and all new methods trying to leave will be pulled back.

Even if you really create a new architecture with better performance, as long as OpenAI scales the Transformer ten times, your results will be outshined.

Current LLMs Are Not General Intelligence

Llion Jones further pointed out that current large language models are not general intelligence and exhibit the characteristic of "jagged intelligence".

That is to say, they can perform like geniuses in some tasks, but can make stupid mistakes in an instant, which is very jarring.

It just solved a doctoral - level problem, but the next second it gave an answer that even a primary school student wouldn't get wrong. This contrast is very striking.

He believes that this actually reveals a fundamental problem in the current architecture.

The problem is that they are too "versatile".

You can let them do anything as long as you train them enough and adjust the parameters accurately.

But precisely because of this, we have ignored the key issue - "Is there a better way to represent knowledge and think about problems?"

Now, people are piling everything into the Transformer and using it as a universal tool. When there is a lack of a certain function, they just add a module to it.

We clearly know that we need uncertainty modeling and adaptive computing capabilities, but we choose to add these features externally instead of re - thinking from the architecture itself.

To escape this cycle, Jones significantly reduced his Transformer - related research at the beginning of 2025 and turned to more exploratory directions.

He and his colleagues at Sakana AI, such as Luke Darlow, designed Continuous Thought Machines (CTM) inspired by biology and nature.

Portal: https://sakana.ai/ctm/

This is not a wild invention but a simplified simulation of how the brain works.

Neurons in the brain are not static switches but transmit information through synchronous oscillations.

CTM captures this essence: it uses neural dynamics as the core representation, allowing the model to gradually carry out calculations in the "internal thinking dimension".

He said, "We are not pursuing full biological feasibility because the brain does not synchronize all neurons in a wired way. But this idea brings new research possibilities."

Importantly, when they were doing this research, they did not have the common "pressure to publish first" in the academic circle.

Because no one is working in this direction. They have enough time to polish this paper, solidify the research, and conduct sufficient control experiments.

He hopes that this research can be a "demonstration case" to encourage other researchers to try those research directions that seem risky but are more likely to lead to the next major breakthrough.

Those Who Come After Mourn but Do Not Learn from It

This is one of the most honest statements in the AI field recently.

Llion Jones admits that most current research may just be making minor improvements to the local optimal solution, and the real breakthrough may be in a completely different direction.

He has a deep understanding of this - after all, he once made the achievements of the previous generation of researchers fade.

What's disturbing is that if he is right, then all those who are burying their heads in improving Transformer variants are wasting their time.

All mixture - of - experts models, all architecture fine - tunings, all attention mechanism variants - may become obsolete instantly when a new paradigm emerges.

But the trap is that unless someone really makes a breakthrough, you can never be sure if you are trapped in a local optimum.

When you are in the middle of it, everything seems to be progress. Didn't the improvement of RNN also seem irresistible until the emergence of the Transformer?

Similarly, Ilya also commented recently that just scaling the current architecture is not enough to achieve AGI:

One consequence of the scaling era is that scaling has sucked all the oxygen out of the room.

Because of this, everyone has started doing the same thing. We have reached the current situation - a world where the number of companies is more than the number of innovative electronics.

So, how should we choose?

Llion Jones doesn't claim to know the future direction. He just frankly says that the Transformer may not be the long - term answer. This is honest but lacks operability.

The problem is that every paradigm shift seems like a waste in hindsight, but it was a necessary exploration at that time. We can't skip this stage. We can only pray that someone can find the exit faster.

More reading:

Is the Transformer Dead? DeepMind Is Betting on Another AGI Route

Google Unveils a Transformer Killer, a Major Breakthrough in 8 Years! The CEO Sets a Deadline for AGI

End the Transformer's Dominance! An Alumnus from Tsinghua University's Yao Class Takes Action, Aiming at AI's "Catastrophic Forgetting"

A Break - up Letter from the Father of the Transformer: It's Been 8 Years! The World Needs a New AI Architecture

Reference materials:

https://www.youtube.com/watch?v=DtePicx_kFY&t=1s

This article is from the WeChat official account "New Intelligence Yuan". The author is New Intelligence Yuan. It is published by 36Kr with authorization.