Yann LeCun blasts Geoffrey Hinton: He endorses LLMs just because he wants to slack off and retire.
This time, Lecun really had a confrontation with Hinton...
Hinton never paid special attention to LLMs before. Then in 2023, when GPT - 4 came out, he suddenly had an epiphany:
"My goodness, these models are already very close to human intelligence. They may have subjective experiences..."
Regarding this change, Lecun said -
I completely disagree and can't understand it.
I feel like he just wants to slack off: "Okay, this is what we need. I can declare victory."
"Yeah, I can retire. Then go around giving speeches about the dangers of AI."
Immediately afterwards, he changed the subject and pointed the finger at another Turing Award winner.
Actually, I've said a lot of things years ago, which Hinton has only recently realized.
Bengio's situation is similar.
This is why when Lecun was asked by the host why he was so "different", he replied:
There has never been a situation where I diverged from Hinton and Bengio. It's them who have changed.
Since we're talking about all these, it's inevitable to mention his former employer.
By early 2024, especially in 2025, FAIR no longer met the conditions I thought were necessary to maintain innovation, research, and breakthroughs.
A lot of excellent people have left.
As for the reason, Lecun said that actually Zuckerberg is very nice, and the leadership is also very supportive of him. However, after Meta got involved in the LLM competition, it simply couldn't focus solely on research.
Regarding this, Lecun said he was very regretful.
Because in his view, achieving breakthrough research is "actually very simple".
Just hire the best people. These people have a good sense and know what to do. You provide them with the resources they need to succeed, and then...
Get out of the way.
But the host was still not satisfied and kept asking questions: Why? Why? Why??
The main suspect - King Alexander.
Host:
Was the acquisition of Scale AI one of the catalysts for this pure LLM focus?
LeCun's answer was very straightforward. He answered whatever was asked.
Definitely. But I'm not sure if I have enough internal information to comment.
Zuckerberg may have seen a potential successor in King Alexander, a younger version of himself.
In addition to these, of course, the classic segment was also retained.
Lecun, in a somewhat teasing tone, once again provoked the LLM camp.
JEPA - like world models will dominate the AI circle in five years. (Laughs)
This is Lecun's latest podcast interview. He talked with the host for almost an hour and a half about world models, JEPA, why he left Meta, why LLMs can't lead to AGI...
I haven't listened to an interview word - for - word like this for a long time. I'm really exhausted.
I didn't dare to skip any part throughout. It was full of highlights. Lecun kept spouting radical views throughout:
Anthropic is trying to promote AI regulation through fear. I completely disagree with this approach.
LLMs can never be reliable. Not everything is about coding.
Imitation learning just doesn't work. It can't even handle the task of autonomous driving.
What world models aim to solve is to handle new tasks zero - shot.
If you're doing a PhD, don't work on LLMs. It's meaningless. You can't make a contribution.
There are still a few places really doing research, like DeepMind. But the whole industry is becoming more and more closed.
The full text of the interview is attached below.
To ensure readability, QbitAI made some adjustments to the content without changing the original meaning.
Enjoy.
Why LLMs are not the path to intelligence
Host: You bet on neural networks back then, and everyone was skeptical of you. But it turned out you were right.
Now you're doing something similar, betting against LLMs and the mainstream generative architectures.
You recently founded a new company, AMI, in this direction. What is AMI doing?
LeCun: First of all, let me make it clear. There's nothing wrong with LLMs.
LLMs are the foundation of many very useful AI products. I use them myself. They're good at what they're supposed to do.
But LLMs are not the path to human - level intelligence, not even to animal - level intelligence.
Host: You also helped create some of the earliest major open - source LLMs.
LeCun: That's right. So what is AMI? AMI stands for Advanced Machine Intelligence. Our positioning is AI for the real world.
The AI technologies we're familiar with today are good at language manipulation.
Language is a very special thing. It's particularly suitable for these currently successful architectures.
But what about the real world? It's high - dimensional, continuous, noisy, and chaotic. The difficulty is on a completely different level.
This is also what I've been working on for most of my career. I've accelerated the progress in the past five or six years and made substantial progress in the last two years.
By the end of last year, it was obvious that Meta was no longer the right place to advance this project, so I left and founded AMI.
Host: This seems to be an industry trend. More and more people are leaving big companies or research labs to start their own businesses with the research directions they're excited about.
LeCun: This is indeed a very strange trade - off.
There are two models. One is a lot of exploratory research, with many directions running in parallel. Then when something seems to work, you need to keep advancing it, but it's no longer research.
The people doing these things are called researchers - at least that's what the media calls them - but in fact, they've become engineers and product developers.
This has happened several times at Meta.
At the beginning of 2023, Llama 1 developed by FAIR was very promising. Meta specifically created the Gen AI organization to turn it into a real product. Later, Llama 2, Llama 3, and Llama 4 came out.
Llama 4 was a bit disappointing. Zuckerberg was not satisfied with it and reorganized the whole organization, bringing in new people.
But what really happened in the past year is that Meta realized it was falling behind, so it refocused its strategy on catching up with the industry.
The side effect is that a lot of exploratory research has been deprioritized.
My work on JEPA and world models was not affected, but the rest of the company was completely focused on LLMs.
This made it clear to me that Meta was no longer the right place to advance this project.
We had some initial results and needed to shift from research to real - world technology development, scaling, and productization.
At the same time, we also realized that Meta is not really interested in most application scenarios, such as manufacturing.
World models
Host: You're pursuing the general direction of world models. But there are also others approaching world models from a more generative perspective, such as Google's Genie, various video models, VLA, and the 3D space model developed by Fei - Fei Li... How do you compare the JEPA model with these methods?
LeCun: World models are quickly becoming a buzzword, both in the research field and in the industry.
I won't say much about VLA. This path is now generally considered a dead end. It's not reliable enough and requires too much training data.
So what is a world model? Fundamentally, a world model enables an intelligent agent to predict the consequences of its own actions.
I can't imagine how you could build an agent system without the ability to predict the consequences of its own actions. If humans act without considering the consequences, others would think we're stupid.
So that's what a world model is. If you can predict the consequences of your own actions, you can plan a series of actions to complete a task and achieve a goal.
You do this through planning, reasoning, searching, and optimization, rather than autoregressive prediction token by token like an LLM. You're searching for an optimal sequence of actions to complete a task.
LLMs don't have the ability to predict the consequences of their own actions, nor do they have real planning ability, because reasoning in LLMs is just predicting the next token, not searching.
So, intelligent behavior requires three characteristics.
First, the ability to predict the consequences of actions.
Second, the ability to plan through optimization and search to find a sequence of actions that can produce the right results.
Third, how you predict the consequences of actions.
For example, there's an open water bottle in front of me. If I push the bottom of the bottle, it will slide on the table. If I push the top of the bottle, it may tip over.
But we can't precisely predict which direction the bottle will fall. We can't predict these at the pixel level.
The world model in our brains predicts at an abstract level of representation.
JEPA
Host: Is the design of this architecture largely inspired by the human brain?
LeCun: At least it's inspired by cognitive science. There's a big gap between this inspiration and translating it into a specific neural network architecture.
Cognitive science is indeed a motivation. System 2 in psychology is what this means. When you're doing deliberate, reflective actions, you imagine and predict the consequences of your actions and then plan accordingly. This is different from the instinctive, reactive actions of System 1.
So there's an inspiration source, but there's also a lot of empirical evidence showing that you shouldn't generate pixels.
I've been interested in building world models through prediction for a long time.
About five years ago, I had an epiphany. I realized that all the architectures that successfully learned good image and video representations were non - generative.
VAE, variational autoencoder, or more generally, autoencoders, intuitively seem to be a natural way to learn abstract representations of inputs. You input an image into a neural network and train it to reconstruct the input at the output.
But if you do this directly with a large neural network, nothing interesting will happen. It just learns the identity function, which is completely meaningless.
Using VAE to learn image representations can get you something, but the effect is really not good. The same goes for sparse autoencoders.
There's another type of technology called denoising autoencoder, and MAE is a variant of it. BERT in NLP has a similar idea. You corrupt part of an image and then train a neural network to restore the original image.
FAIR once had a large - scale project on this, investing a lot of computing resources, and the results were very disappointing.
But at the same time, some of the same people and others in Paris and New York were working on another set of technologies, using non - generative architectures.
You take an image, corrupt it, feed the two versions into encoders respectively, and then use a predictor to predict the representation of the original version from the representation of the corrupted version.
This is JEPA. One encoder encodes one observation, another encoder encodes another observation, and then a predictor predicts the representation of the first from the representation of the second.
Problems with current embodied models
Host: Now many robot companies are releasing more and more impressive demos, which seem to show some planning and reasoning abilities, even being able to execute tasks in unfamiliar rooms or with unfamiliar task versions. What do you think?
LeCun: There are indeed real advancements, and some demos are really impressive. But these systems require a huge amount of data for training, either collected through teleoperation or by manually operating the gripper...
They're mainly trained through imitation learning, with a little bit of reinforcement learning in simulation.
The problem is that imitation learning requires a large amount of data, and you need to collect data separately for each task you want the robot to complete. It's costly and relatively fragile.
If the system has a world model and can predict the results of actions, it can directly plan actions to complete a new task without specifically training for this task.
The generalization ability brought by a world model is much greater, and it can cover a wider range of tasks with less training data.
There are indeed synergistic effects between tasks. The more tasks you train the system to complete, the less data it needs to learn new tasks.
But the hope for a world model is to handle new tasks zero - shot. The goal is to solve a large number of problems with little or even zero training data, maybe just a little bit of RL - style fine - tuning.
Humans have this ability completely, and many animals do too.
A 17 - year - old can learn to drive in just ten or twenty hours. We have millions of hours of driving data, but we still don't have L5 autonomous driving.
Imitation learning can't even handle the task of autonomous driving.
Host: One idea is to use video models to generate a large amount of synthetic data for simulation. Even if it's not perfect physically, it can improve the performance of robots in the real world. What do you think?
LeCun: It's still the same question. Why can a 17 - year - old learn to drive in 20 hours?
You don't need millions of hours of demonstration data or synthetic data.
If we crack this problem, we won't need to generate data.
We may still need to train in simulation, but we won't need the amount of data and the number of trials and errors required by current systems.
The herd effect in Silicon Valley
Host: An interesting point is that if you're OpenAI and you know that scaling something will make it better, from a business perspective, you don't have much motivation to do things with higher data efficiency.
LeCun: Other companies also don't have the motivation to do things differently. No one can afford to fall behind their competitors. This is a kind of herd effect in Silicon Valley. Everyone is digging in the same trench.
This is why I set the headquarters of AMI in Paris and the US office in New York, not in Silicon Valley.
Host