Transformer author makes a weighty prediction: There will be no winter for AI, and the inference revolution will detonate a trillion-dollar market
Is the LLM approaching its limit? Turing Award winners Yann LeCun and Richard Sutton jointly express pessimism, but are countered by Łukasz Kaiser, the father of Transformer!
There is no winter for AI, only the heatwaves of capital and computing power!
The spark of Transformer has been burning for seven years. Now, Reasoning Models are igniting the second - round revolution.
Łukasz Kaiser, a co - author of Transformer and a researcher at OpenAI, predicts:
In the next one or two years, AI will make rapid leaps - the bottleneck lies not in algorithms, but in GPUs and energy.
Reasoning models are rewriting the rules. Money and electricity are the "hard currencies" that determine success.
In 2017, the Transformer architecture emerged, and its eight co - authors were inscribed in the annals of AI history.
It is worth noting that Łukasz Kaiser joined OpenAI before the advent of ChatGPT and has since been focusing on researching reasoning models - he believes this is the most significant breakthrough since the Transformer in 2017.
Recently, he publicly stated that "reasoning models" are just the beginning, and it is far from the end - defining moment for AI. But perhaps this is exactly what is most exciting.
We finally have a machine that can think. Now it's time to make it less flashy and more focused on getting things done.
The Trillion - Dollar Dispute over the AI Route
This is a dispute over AI concepts worth trillions of dollars.
"Artificial General Intelligence" has become the goal pursued by most in the industry - a general intelligent agent truly capable of human - level cognition.
OpenAI has been burning money and resources, constantly scaling up, plunging Silicon Valley into an "AGI frenzy": LLM + data + GPU + energy = AGI!
When OpenAI released o3, economist Tyler Cowen believed that AGI had been born, and April 16, 2025, was the AGI Day.
Even when Karpathy said that "AGI is still 10 years away," he was considered overly pessimistic about the future of AI in the Bay Area of the United States.
But there are those who disagree:
Call it sunk cost or bias, but never call it intelligence.
The $10 - trillion illusion in Silicon Valley
Richard Sutton, the father of reinforcement learning, the 2024 Turing Award winner, and the author of "The Bitter Lesson," asserts that large language models have reached a dead end.
In his view, large language models have not learned any "bitter lessons."
In other words, he points out a key flaw in large language models: There is a limit to their improvement ability, and this limit is much closer than is commonly known.
Turing Award winner Yann LeCun has supported similar views for many years.
François Chollet, the co - founder of the Ndea AI Lab and the father of the open - source deep - learning framework Keras, also holds this view.
LLMs are a dead end for AGI, so he co - initiated the million - dollar AI award, the ARC Prize, to get everyone back on the right track to AGI.
Recently, Łukasz Kaiser publicly refuted the view that "LLMs are a dead end."
Although he is not sure if Sutton is targeting reasoning - type LLMs, reasoning models have a fundamental breakthrough: they require several orders of magnitude less training data than traditional models.
This type of model can truly accelerate the scientific research process. More experiments could be carried out in parallel, but we currently lack sufficient computing power.
Ultimately, it is a computing - power bottleneck, and the key lies in GPUs and energy. This is the fundamental constraint, and all current laboratories are in the same situation. This is why Altman is frantically raising funds.
The Reasoning Revolution
LLM reasoning is causing a major paradigm shift in the AI field.
Ordinary users are very likely to have never encountered a real reasoning - type large language model.
Even if they have used one, it was indirectly through GPT - 5's routing system, and they were unaware of it.
Reasoning models have the following capabilities:
They can self - reflect and detect errors in their own output of thought chains, thereby adjusting the reasoning path in a timely manner;
When receiving instructions to solve complex problems, they can dynamically allocate more computing resources through "in - depth thinking";
During the reasoning process, they can directly call external tools to perform operations;
They can generate multiple alternative reasoning paths and independently select the optimal solution.
This is completely different from the era of pure autoregressive large language models like GPT - 4.
Moreover, Reasoning Models have been around for less than a year and are far from reaching their potential limit.
In most reasoning - intensive tasks, OpenAI's first reasoning model, o1, significantly outperforms the then - strongest general model, GPT - 4o.
They don't rush to speak. Instead, they "draft in their minds" first - reasoning, retrieving, and calling tools, just like humans hesitating for a few seconds before answering a question.
In this mode, AI can not only continue a conversation but also "get things done": write a report, troubleshoot a piece of code, or check a database.
Łukasz Kaiser sees this as a quiet paradigm shift. "It's like going from a conversation generator to a real thinker," he says.
What excites him even more is that reasoning models have a much lower demand for data but can solve more difficult problems.
This is especially evident in structured tasks such as mathematics and program analysis.
Meeting the Father of AGI at 16: The Rapid Evolution of AI
Interestingly, at the age of 16, Łukasz Kaiser's first paid job was programming for Ben Goertzel.
Around 2001, Ben Goertzel officially used and popularized the term "Artificial General Intelligence" to distinguish it from the then - existing "Narrow AI."
Now, AGI is understood as being able to perform all tasks that humans can do.
However, the reality is that there are fundamental differences between AI and human intelligence.
It has surpassed most people in some fields (such as games and solving math problems), but it is still powerless in matters related to the physical world -
Today's robots are still extremely clumsy.
This differential development may be the norm for technological evolution.
Therefore, Łukasz Kaiser believes that the future development path will be:
AI capabilities will continue to strengthen. But at least in the short term, there will still be irreplaceable human jobs in fields related to the physical world, both technically and in terms of economic cost.
Compared with conceptual debates, the transformation brought about by reasoning models is more worthy of attention at this stage.
The biggest breakthrough in the past year is that AI can truly handle certain work tasks in the workplace and do them quite well -
It not only responds in seconds but can also work continuously for hours to produce valuable results.
This means we can hand over our to - do lists to AI, thereby improving overall efficiency. Whether we call it AGI or not, the fact that AI is becoming more and more powerful is undeniable.
The programming field is the best example: since AI developers started focusing on this area, the progress has been astonishing.
Both Anthropic's Claude and OpenAI's Codex can now generate complete programs according to requirements in just a few hours.
They are good at understanding large codebases, conducting code reviews, detecting vulnerabilities, and even security threats - these capabilities were unimaginable a year ago.
Recall that when Claude 3.5 was released about a year ago, it was an epoch - making breakthrough. At that time, the pass rate of the SWE - Bench benchmark test was about 30%, and now it has reached 75%.
Three months ago, code models were just auxiliary tools, but now they can truly handle complex codebases. The implications of this exponential progress are self - evident.
AI is developing so rapidly, but some people are starting to worry that we are entering another AI winter.
Łukasz Kaiser is relatively optimistic.
The New AI Paradigm: Reasoning Has Just Begun
In the past, there was indeed the Transformer paradigm, and with Transformer + Scaling, ChatGPT was created.
Of course, this autoregressive paradigm, which involves predicting the next word and training larger and larger models on more and more data, has continued for many years.
The general Internet data has basically been used up. The models have been trained on all this data. It's not easy for anyone to obtain much more (data).
But the new reasoning paradigm has just begun.
Łukasz Kaiser thinks this paradigm is so new that it is only at the very beginning of a very steep upward path.
In terms of its future capabilities, we have only taken a small step. So, we know it can already do amazing things.
But we haven't really made full use of it. We've scaled it up a bit, but there's still more room for expansion. There are more research methods to make it better. So, in this new paradigm, we're on a steep upward path.
We're witnessing the upward trend of the new paradigm, but it requires further in - depth research: some research yields good results, and some are mediocre. You never know - this is the exciting part of research.
If you combine the old and new paradigms, then you need to start preparing -
The AI winter is not coming soon. In fact, the improvement may be very significant in the next one or two years.
After that, the world will change dramatically - it's almost a bit scary.
The breakthrough in reasoning is really huge.
This is not a coincidence. Before GPT - 4, OpenAI started researching reasoning models because people clearly saw that pure scaling was not economically feasible, and we needed a new paradigm.
Ł