HomeArticle

Are LLMs just "wordsmiths in the dark"? Fei-Fei Li: The next battleground for AI is "spatial intelligence"

36氪的朋友们2025-11-11 18:19
Fei-Fei Li: The next battlefield for AI is spatial intelligence, which will drive the development of world models.

On November 10th, local time in the United States, Fei-Fei Li, the "Godmother of AI", personally wrote an article stating that the next battlefield for generative AI is "Spatial Intelligence". For the first time, she systematically explained what spatial intelligence is, why it is so important, and how to build a world model that can unlock spatial intelligence.

Fei-Fei Li pointed out the "fatal flaw" in current AI: they are just "wordsmiths in the dark", only understanding language but not the world! Current AI has mastered a vast amount of abstract knowledge, but it knows almost nothing about common sense and spatial laws in the physical world, such as "what shape is an object?", "how much force is needed to knock over a cup?", or "will turning a corner cause a collision with a wall?"

This flaw has directly blocked the "main artery" for AI upgrading! This is also why autonomous robots still act like toddlers, and why the immersive metaverse experience we long for is still far off.

Professor Fei-Fei Li sounded the alarm: the real breakthrough for AI in the next decade is not to pile up words, but to unlock "spatial intelligence"! This is the ultimate ability that connects perception, imagination, and action.

After Fei-Fei Li's article was published, it immediately sparked a heated discussion on social platforms:

I'm really looking forward to seeing how the world model will change the way we tell stories, build virtual worlds, and even develop the digital economy.

AI has learned to "see" and "speak", and next, we are teaching it to understand and adapt to the real world we live in.

Spatial intelligence is the missing link in the world model. It will drive a huge leap in the capabilities of LLMs. As long as the causal reasoning ability and energy efficiency reach the corresponding levels, we will stand at the inflection point towards AGI.

The following is the full text of Fei-Fei Li's article:

From Words to the World: Spatial Intelligence is the Next Frontier for AI

In 1950, when computers could only perform automated arithmetic and simple logical operations, Alan Turing posed a question that still challenges the era today: Can machines think?

To understand his foresight back then, one needs extraordinary imagination: intelligence may ultimately be constructed by humans, rather than waiting for it to descend from the heavens. This insight later gave rise to the continuous exploration called "artificial intelligence". Twenty-five years after I embarked on AI research, Turing's vision still inspires me. But how far are we from the goal? The answer is not simple.

Today, cutting-edge AI technologies represented by large language models (LLMs) have begun to change the way we acquire and apply abstract knowledge. However, they are ultimately like "wordsmiths in the dark", with flowery language but lacking experience, and being knowledgeable but detached from reality. Spatial intelligence will reshape the way we create the real and virtual worlds, driving revolutionary progress in fields such as narrative art, the creative industry, robotics, and scientific exploration. This is the new frontier that AI urgently needs to explore.

Since I entered this field, the pursuit of visual and spatial intelligence has always been the guiding star for me. That's why I spent several years building ImageNet, the first large-scale visual learning and benchmarking dataset. Together with neural network algorithms and modern computing power such as GPUs, it forms the three pillars of modern AI.

That's also why my lab at Stanford has been committed to integrating computer vision and robot learning in the past decade. And that's why I co-founded World Labs with Justin Johnson, Christoph Lassner, and Ben Mildenhall, hoping to fully realize this vision for the first time.

This article will explain the connotation and value of spatial intelligence and show how we can unleash its potential by building a world model. This transformation will reshape creativity, embodied intelligence, and even the course of human civilization.

Spatial Intelligence: The Cornerstone of Human Cognition

AI has never been as exciting as it is today. Generative AI represented by large language models has moved from the laboratory to daily life, becoming a tool for billions of people to create, produce, and communicate. They have demonstrated capabilities that were once unimaginable: writing fluently, coding in batches, generating realistic images, and even short videos. There is no need to debate whether AI can change the world. By any reasonable definition, the change has already begun.

However, there are still numerous challenges ahead of us. The vision of autonomous robots remains a concept and is far from becoming the norm predicted by futurists. The dream of accelerating research in fields such as disease treatment, new material discovery, and particle physics has mostly not been realized. An AI that can truly understand and empower human creators is still out of reach, whether it's helping students learn molecular chemistry, designers conceive spaces, filmmakers build worlds, or ordinary people pursue immersive experiences.

To understand why these capabilities are difficult to break through, we need to trace back the evolutionary process of spatial intelligence and examine how it shapes our perception of the world.

Vision has long been regarded as the cornerstone of human intelligence, but its power comes from a more fundamental source. Long before animals could build nests, raise offspring, communicate in language, or establish civilizations, simple perceptual abilities had quietly ignited the spark of intelligent evolution.

This ability to obtain information from the outside world, whether it's capturing a ray of light or perceiving the texture of an object