StartseiteArtikel

Yann LeCun's complaints before leaving the company were extremely intense.

量子位2025-12-22 09:02
Criticize Meta for being closed and bluntly state that LLM cannot lead to AGI.

What a relief to speak one's mind!

LeCun, who is about to officially leave Meta at the end of the year, is now holding nothing back.

Not optimistic about the idea that large language models can lead to AGI, he sharply pointed out:

The path to superintelligence—simply train large language models, train with more synthetic data, hire thousands of people to "educate" your system in post - training, and invent new tricks for reinforcement learning—I think it's complete nonsense. It simply won't work.

Displeased with the closed - door approach of Meta, which is about to become his "former employer", he also spoke bluntly:

Meta is becoming more and more closed... FAIR is being pushed to work on projects that are more short - term than traditionally.

Moreover, he also revealed that the new company he is going to establish will still adhere to the principle of openness.

The above content is from a recent podcast that LeCun participated in. In the nearly two - hour conversation, he mainly answered:

Why is Silicon Valley's obsession with scaling language models a dead end?

Why is the hardest problem in the AI field achieving the intelligence level of a dog, rather than that of a human?

Why does the new company choose to build a world model that makes predictions in an abstract representation space, rather than a model that directly generates pixels?

...

In summary, whether it's his nearly 12 - year research experience at Meta, the new company he's going to establish next, or the AGI he wants to achieve in the future, it's all here.

The next stage of life: Establishing a new company, AMI

After bidding farewell to his employer of twelve years, LeCun's next move is clear—entrepreneurship.

He will still work on the world model, which was suppressed at Meta before.

LeCun revealed that his company is called Advanced Machine Intelligence (AMI). It will give priority to the research on world models and will be open - source...

This move has put his conflict with Meta out in the open.

As is well - known, since Alexander Wang took office, Meta has made a sharp turn, changing from an open - source pioneer to a more and more closed company.

LeCun was even more outspoken:

FAIR once had a huge impact on the AI research ecosystem, and the core lies in its highly open concept. But in the past few years, companies including OpenAI, Google, and Meta have all become more closed.

So instead of staying at Meta and being at the mercy of others, it's better to come out and do the research he likes.

Moreover, LeCun emphasized that if research results are not publicly published, it can't be considered real research. Sticking to old ways will only lead to self - deception. Without the verification of the academic community, it's likely to be just wishful thinking.

He has seen many similar phenomena: internally, a project is highly touted, but people don't realize that what others are doing is actually better.

Moreover, Meta is now only pursuing the impact of short - term projects, and it's actually difficult to make valuable contributions. To make breakthroughs, the only way is to publicly publish research results.

So the new company is taking a completely different path from Meta's current one.

Not only will it conduct research, but it will also launch practical products around world models and planning capabilities. AMI's ultimate goal is to become one of the main suppliers of future intelligent systems.

The reason for choosing the world model is that LeCun believes:

The correct way to build an intelligent system is through world models.

This is also what he has been committed to researching for many years. In several projects at New York University and Meta, it has achieved rapid development. Now it's time to put the research into practice.

As for where FAIR, which he built himself, will go after his departure? LeCun also revealed a little.

First, he said that Alexander Wang is not his successor at Meta.

Alexander Wang's internal responsibilities are more inclined to overall operation and management, rather than being a dedicated researcher. The Superintelligence Laboratory is also led by him, with four departments under it:

FAIR: Focuses on long - term research;

TBD Laboratory: Focuses on cutting - edge models (mainly LLM);

AI Basic Design Department: Responsible for software infrastructure;

Product Department: Transforms cutting - edge models into practical products such as chatbots and integrates them into platforms such as WhatsApp.

Among them, FAIR has been handed over to Rob Fergus, who is also LeCun's colleague at New York University. Currently, FAIR has reduced its emphasis on paper publication and is more inclined to short - term projects and supporting the cutting - edge models of the TBD Laboratory.

And LeCun himself is still an AI scientist at FAIR for now, but there are only the last three weeks left in his tenure.

LeCun's departure marks the complete end of Meta's decade - long golden era of "academic - style" research represented by FAIR, and also marks LeCun's determination to leave LLM and turn to world models.

So the question is, why does LeCun think that world models are correct and LLM is wrong?

The world model to be built is "completely different" from LLM

The core reason is that LeCun believes that they are essentially designed to solve different problems, and they are "completely different things".

The former is designed to handle high - dimensional, continuous, and noisy data modalities (such as images or videos), which form the basis for perceiving and interacting with the real world;

The latter performs well in handling discrete, symbolic text data, but is not suitable for handling the above - mentioned real - world data. LeCun described it as "completely terrible".

He also asserted that "generative models cannot be used" to process image and video data, especially those generative models that tokenize data into discrete symbols (which is the basis of most LLMs).

A large amount of empirical evidence shows that it simply won't work.

Based on this, LeCun firmly believes that AI can never reach human intelligence levels by only training on text data.

After comparing the massive text data (about 30 trillion tokens) required for LLM training with an equivalent amount of video data (about 15,000 hours), he found:

The information content of 15,000 hours of video is equivalent to the total visual information received by a 4 - year - old child during their waking hours in a lifetime, but this is only equivalent to half an hour of uploads on YouTube, and the latter has a richer information structure and higher redundancy.

This shows that the internal structure of real - world data such as videos is much richer than that of text.

Precisely because he deeply realizes that "text cannot carry all the structures and dynamics of the world", LeCun has turned his attention back to a path closer to the essence of human learning—let machines actively build an internal, predictable model by observing the continuous changes in the world, just like infants.

And this is what LeCun envisions as the world model.

In his view, the key role of the world model is to predict the consequences of specific actions or a series of actions, and its core cornerstones are prediction and planning.

Prediction: Based on the current state and potential actions, it can deduce possible future states (or abstract representations of states);

Planning: Based on prediction, through search and optimization, it determines the best sequence of actions to achieve preset goals.

As for what constitutes a "good" world model, LeCun refuted the view that it needs to perfectly simulate reality and emphasized the importance of abstraction*.

Previously, many people thought that the world model must be a "simulator that reproduces all the details of the world", just like the holodeck in Star Trek.

(The holodeck is a specially designed enclosed room controlled by a computer, which can generate three - dimensional, realistic environments and objects through holographic projection technology.)

But LeCun believes that this idea is "wrong and harmful", and practice has proven that abstraction is sometimes more effective.

All science and simulation work by "inventing abstractions". For example, computational fluid dynamics ignores underlying details such as molecules and only focuses on macroscopic variables (such as velocity, density, temperature), and this kind of abstraction can bring "longer - term and more reliable predictions".

Therefore, an effective method is to learn an abstract representation space that will "eliminate all unpredictable details in the input, including noise".

Thus, he also concluded that world models don't have to be complete simulators. "They are simulators, but in the abstract representation space."

As for the specific implementation method, he currently thinks of making predictions in this abstract representation space through the Joint Embedding Prediction Architecture (JEPA).

And how did the idea of JEPA come about? LeCun takes us through the tortuous development history of "how AI learns" over the past 20 years.

From unsupervised learning to JEPA

LeCun admitted that for nearly two decades, he has always believed that the correct path to building intelligent systems is some form of unsupervised learning.

Just like infants looking at the world, they don't recognize the world after being "labeled". Similarly, true intelligence cannot be built by relying on a large amount of manually labeled data.

Therefore, he initially focused on unsupervised learning. This design of "letting machines discover patterns from raw data on their own" perfectly fits his concept.

He immediately took action and started trying to train autoencoders. The core logic is: first compress, then restore.

For example, an image (input data) is compressed by an encoder into a compact, low - dimensional "summary" (i.e., representation or feature); then this "summary" is reconstructed by a decoder to restore an image as similar as possible to the original input.

Once this "summary" can almost perfectly restore the original input, it is reasonable to assume that it must have captured the most critical and essential information in the data.

Therefore, if this "summary" is used for other tasks later, it is likely to perform well.

However, later research made LeCun realize that "the intuition that the representation must contain all input information is wrong".

Because he found that there was a "cheating" phenomenon in the above - mentioned learning process of AI.

Just like the meaning represented by the "identity function" in mathematics - the output is just another form of the input. AI doesn't really understand what it has learned; it is just "copying the answers".

Without understanding, how can there be true intelligence?

So, LeCun then introduced the core idea of the "Information Bottleneck" to correct the direction.

Its purpose is to limit the information content of the representation, thereby forcing the system to learn a more concise and useful representation, that is, the so - called abstraction ability.

Later, he and several students did a lot of work in this direction, hoping to pre - train very deep neural networks.

However, with the historical turning point of deep learning - the rise of fully supervised learning, research on unsupervised or self - supervised learning was once put on hold.

The situation at that time was as follows.

In the early 2010s, researchers faced a core problem: theoretically, deep neural networks with strong expressive power were extremely difficult to train in practice. The gradients either disappeared or exploded, and the parameters in the deep layers of the network could hardly learn anything.

Several simple but revolutionary engineering improvements completely changed the situation.

One is the victory of ReLU (Rectified Linear Unit). Previously, people generally used Sigmoid or Tanh as activation functions. Their gradients became very gentle at both ends (saturation regions), causing the gradient signal to rapidly decay during backpropagation and making it impossible to effectively update the deep - layer weights. This is the "vanishing gradient" problem.

The gradient of ReLU is always 1 in the positive interval, which perfectly solves the vanishing gradient problem. It also has an extremely fast calculation speed and almost single - handedly enabled training to go deep into dozens or even hundreds of layers.

The other is the effectiveness of normalization. As the number of network layers increased, the distribution of the input to each layer would change drastically, forcing subsequent layers to constantly adapt to the new data distribution, which greatly slowed down the training speed and made the setting of hyperparameters such as the learning rate extremely sensitive.

Normalization technology forces the input of each layer to be