Feifei Li poured cold water on AGI.
On November 17th, Zhidx reported that yesterday, Fei-Fei Li, a professor at Stanford University and the co-founder and CEO of World Labs, shared her incisive views on the future of AI on the overseas tech podcast Lenny's Podcast. She believes that the development of AI cannot always rely on the Scaling Law and requires fundamental technological innovation. Moreover, "Artificial General Intelligence" (AGI) is more like a marketing term than a rigorous scientific one.
Fei-Fei Li reviewed her over 20-year experience in scientific research and entrepreneurship and summarized the golden formula for modern AI: the combination of neural networks, big data, and GPUs. To this day, the success of ChatGPT still uses the same formula.
However, she also warned that simply "piling up" data scale and computing power is not enough to achieve a breakthrough in intelligence. Current AI still struggles to accomplish many tasks that are easy for humans, such as accurately counting the number of objects in a video or deriving physical laws from observational data like Newton did.
Fei-Fei Li believes that we still have a long way to go before we can develop an AI system with true creativity, abstract thinking ability, and emotional intelligence. "There are still too many things that AI can't do today."
Fei-Fei Li holds a reserved attitude towards the much-discussed concept of AGI in the industry. She believes that the definition of AGI is vague. As a scientist, she focuses more on solving the fundamental technological challenges faced by AI rather than getting caught up in endless conceptual debates.
▲ Fei-Fei Li being interviewed (Source: Lenny's Podcast)
As an entrepreneur, she also admitted that the competition in the AI field is extremely fierce. The past successful experience of "simple models + massive data", that is, the so-called "bitter lesson", is not fully applicable in applications involving the physical world such as robotics. The fact that autonomous driving has not fully matured after nearly 20 years of development is a typical example.
The difficulty in obtaining data and hardware limitations pose even greater challenges to robot technology for manipulating objects in three-dimensional space than to autonomous driving technology.
Although the journey is long, Fei-Fei Li always believes that the progress of AI is the result of the accumulation of several generations. Relying solely on the current "linguistic intelligence" is not enough. Humans rely on spatial intelligence in many critical scenarios. Researching spatial intelligence not only plays a significant role in the development of robotics and embodied intelligence but also enhances humans at the embodied level, enabling us to gain new capabilities in spatial understanding, object manipulation, and real-world tasks.
The following is a summary of the highlights of Fei-Fei Li's interview. For the full content, please refer to the link at the end of the article:
01.
Emerging from the Winter
Modern AI Finds Its Golden Formula
In the interview, Fei-Fei Li recalled her early days in the AI field.
In 2000, Fei-Fei Li began her doctoral studies at the California Institute of Technology. As one of the first-generation machine learning researchers, one of her research focuses was neural networks. At that time, AI was in a "winter" with little attention. The public didn't care about this field, and there wasn't much funding.
Fei-Fei Li's academic interest has always been focused on visual intelligence. In her view, if human intelligence highly depends on vision, then machine intelligence must also start from "understanding the world". Therefore, during her doctoral studies and the early stage of her teaching career, she chose the most fundamental and difficult direction - object recognition.
At that time, the important value of data for AI was not widely recognized. As the research deepened, Fei-Fei Li and her students gradually realized that big data is the key to making AI come alive.
So, she made an ambitious decision - to collect all image data about objects on the Internet. Thus, around 2006, the ImageNet project began. Eventually, this project collected 15 million images, covered 22,000 object categories, and held an annual challenge.
This seemingly crazy project became the spark of modern AI. In 2012, Hinton's team used ImageNet data and two ordinary gaming GPUs to train a breakthrough neural network model. The combination of big data, neural networks, and GPUs is what Fei-Fei Li calls "the golden formula for modern AI".
Fast forward ten years, when ChatGPT emerged and made the world truly realize the power of AI for the first time, the three key elements behind it were still the same combination from back then: neural networks, big data, and GPUs. Fei-Fei Li believes that the only difference between them is the scale.
Although she is often called the "Godmother of AI", Fei-Fei Li prefers to emphasize that the progress of AI is not the miracle of one person but the collective accumulation of several generations of researchers.
02.
I Don't Know the Difference Between AI and AGI
Maybe Turing Didn't Either
How far is AGI? This question has almost become a must-answer for all AI scholars, experts, and corporate executives in interviews. In Fei-Fei Li's view, the concept of AGI is quite intriguing, and few people can clearly define it.
Fei-Fei Li said bluntly, "I entered the AI field inspired by a question - can machines think and act like humans? From this perspective, I don't know the difference between AI and AGI." She also imagined that if Alan Turing were still alive and asked about the difference between AI and AGI, he might just shrug and say, "I asked the same question in the 1940s."
AI is the "North Star" guiding Fei-Fei Li forward. She said she doesn't want to get caught in the rabbit hole of defining AI and AGI. AGI is more like a marketing term than a scientific one. As a scientist and technology expert, she doesn't care how others call this technology.
Fei-Fei Li emphasized in the conversation that although larger datasets, more GPUs, and expanding existing model architectures can still bring performance improvements, the development of AI cannot rely solely on the Scaling Law.
Current AI still can't accomplish many tasks that even children can do easily, such as accurately counting the number of chairs in a video. Not to mention deriving new natural laws from observations like Newton or Einstein. Even if provided with all the data collected by modern instruments, AI still can't reconstruct the laws of motion from the 17th century.
These examples show that we still have a long way to go before we can develop an AI with true creativity, abstract thinking ability, and emotional intelligence. In the future, fundamental technological innovation is needed rather than simply piling up computing power.
Recently, Fei-Fei Li published a long article detailing the concept of spatial intelligence and proposed that spatial intelligence is the next frontier of AI. In the interview released yesterday, she also shared similar views. Fei-Fei Li believes that relying solely on linguistic intelligence is not enough because humans rely on spatial intelligence in many critical scenarios, such as emergency decision-making at the scene of a fire, traffic accident, or natural disaster.
These activities require an immediate understanding of objects, actions, spatial relationships, and situations, which cannot be achieved by language alone. She gradually realized in her robot research that the key to embodied intelligence lies in understanding the three-dimensional world.
In this context, the "world model" has become a key direction for the next stage of AI development. Different from traditional language models, the world model can not only generate a complete virtual world based on text or images but also allow agents to interact and reason within it. If applied to robots, the world model will be the basis for them to plan paths, understand scenarios, and execute operations.
Fei-Fei Li emphasized that the world model and spatial intelligence are not only the key missing links in the development of robotics but also closely related to humans themselves. Humans are embodied intelligent agents, and AI has enhanced our abilities at the language level, such as in writing or software engineering. In the future, the world model can also enhance humans at the embodied level, enabling us to gain new capabilities in spatial understanding, object manipulation, and real-world tasks.
The world model and spatial intelligence will also have a profound impact on design, engineering, and scientific discovery. For example, the discovery of the DNA double helix structure relied on humans' 3D spatial reasoning from a flat 2D X-ray diffraction image, and this kind of cross-dimensional spatial abstraction is currently difficult for AI to achieve. If the world model can make a breakthrough, AI will be able to have this deeper level of spatial reasoning ability.
03.
Marble Is Not a Video Generation Model
The Intense Competition in AI Entrepreneurship Is "Shocking"
Fei-Fei Li also talked about Marble, a recently released product by World Labs. It is an application based on a cutting-edge world model that can generate an explorable three-dimensional world from just a sentence or an image. Users can freely walk, interact, and navigate in these virtual environments, enabling various applications such as creativity, design, virtual production, and robot simulation.
She emphasized that Marble is not just about generating two-dimensional videos but providing a world with a real spatial structure, allowing creators, game developers, designers, and researchers to quickly generate immersive scenarios. Real-world examples include virtual film production, psychological experiments, and the synthesis of robot training environments.
Marble is fundamentally different from video generation models. Fei-Fei Li said that Marble's core focus is on spatial intelligence, emphasizing the understanding, interaction, and reasoning of three-dimensional and four-dimensional worlds. At the same time, the platform supports exporting scenarios as videos or grid data for creation or simulation.
Fei-Fei Li revealed that World Labs, which has been established for 18 months, now has a team of about 30 people, mainly consisting of researchers and engineers, but also including designers and product personnel.
Fei-Fei Li has had many "entrepreneurial" experiences, from running a dry-cleaning store at the age of 19 to leading relevant research at Google Cloud as the chief AI scientist and then to the Stanford Institute for Human-Centered AI. She was somewhat mentally prepared for the challenges of entrepreneurship.
However, when she truly engaged in AI entrepreneurship, she was still "shocked" by the intense competition in the AI field, from the competition in models and technologies to the scramble for top talents. She realized that she had to stay vigilant at all times.
04.
Building Robots Is Harder Than Building Autonomous Cars
The 'Bitter Lesson' Doesn't Fully Apply
Fei-Fei Li also talked about the "bitter lesson" proposed by Richard Sutton, a pioneer in reinforcement learning: a simple model combined with massive data is often more effective than a complex model with a small amount of data. For her, this is not a "bitter" but a "sweet" lesson, and it was also the core belief when she built ImageNet. However, she emphasized that this lesson cannot be simply applied to the field of robotics.
The first reason is that it is extremely difficult to obtain robot data. Different from language models, language training data is naturally structured words and tokens, with highly consistent input and output forms. What robots really need is action data in the three-dimensional world.
Although online videos are abundant, they lack action annotations that can be directly used for training action strategies. Therefore, robot training has to rely on teleoperation data or synthetic data to make up for it. In other words, robot data is not as naturally "aligned" as language, which makes the "big data" assumption in the bitter lesson difficult to fully hold.
Secondly, robots are physical systems, not pure software models. Different from language models or vision models, robots are more like autonomous cars - they must operate in the real world and involve various complex factors such as hardware, supply chains, and application scenarios.
Fei-Fei Li reviewed the development of autonomous driving: it has been nearly 20 years since Stanford won the DARPA Challenge in 2005. Although deep learning has accelerated the progress of algorithms, autonomous driving is still not fully solved. And autonomous driving is just a much simpler form of robotics, only needing to avoid collisions on a two-dimensional plane. In contrast, it is more difficult for robots to manipulate objects in three-dimensional space.
Nevertheless, she still believes that big data, the world model, and spatial intelligence will be the keys to the breakthrough of robotics, but we are still in a very early stage of exploration.
05.
Conclusion: In the AI Era
Everyone Has Their Own Place
At the end of the interview, Fei-Fei Li voluntarily talked about the widespread global anxiety about whether AI will replace humans. She believes that the development of any technology should not come at the expense of human dignity and initiative, and this should be the core principle for technology development, deployment, and governance.
Whether it's young artists using AI for creation, farmers approaching retirement participating in AI regulatory decision-making as citizens, or nurses being freed from heavy work with the assistance of AI, the real value of AI lies in enhancing human capabilities and serving human needs.
Regarding the ultimate question of whether AI will replace humans, Fei-Fei Li gave a clear and powerful answer: in the AI era, everyone has their own place.
Source:
https://www.youtube.com/watch?v=Ctjiatnd6Xk
This article is from the WeChat official account "Zhidx" (ID: zhidxcom). The author is Chen Junda, the editor is Panken, and it is published by 36Kr with authorization.