HomeArticle

DeepMind CEO defines the standard for world models: not only understand the physical world, but also create it.

36氪的朋友们2025-08-14 09:56
DeepMind launches the Genie 3 world model to advance the development of AGI, and the thinking model is the key.

From AI videos that are indistinguishable from reality, to virtual worlds so detailed that the flowing water and reflections adhere to the laws of physics, and to models that can actively utilize tools to self - correct during reasoning — this is not science fiction but the astonishing capabilities already demonstrated by DeepMind's latest AI tools.

On August 13th, it was reported that Demis Hassabis, the CEO of Google's DeepMind, recently appeared on the podcast "Release Notes" to comprehensively elaborate on the thinking and strategic layout behind DeepMind's latest series of technological breakthroughs. Among them, the breakthrough progress of the world model Genie 3 became the core highlight.

In this in - depth conversation, he outlined an exciting yet challenging new era of AI: from AlphaGo's conquest of Go to Deep Think winning the gold medal in the International Mathematical Olympiad; from Genie 3, which can generate a realistic world, to the upcoming "Omni Model", we are standing at a crucial turning point on the path to AGI. However, even though AI can create a complete virtual universe, it may still violate the rules in a game of chess. This paradox of "patchy intelligence" is revealing the deepest secrets of artificial intelligence.

Hassabis pointed out that "thinking models" are the inevitable path to Artificial General Intelligence (AGI). DeepMind's ultimate goal is to launch an Omni Model that integrates language, multimedia, physical reasoning, and generative capabilities. The core support for this is the continuous evolution of the world model, which will ultimately achieve comprehensive and consistent intelligent performance and promote the safe implementation of AGI.

This interview was hosted by Logan Kilpatrick, the product lead of Google AI Studio. The following is a transcript of the conversation:

Thinking Models: The Evolutionary Path from Game AI to AGI

Kilpatrick: Today our guest is Demis Hassabis, the CEO of Google DeepMind. Hello, Demis. Thank you for coming. I'm very glad to talk with you about the numerous releases and progress we've made in the past few months.

Hassabis: Hello. I'm glad to be here.

Kilpatrick: I'd like to start by talking about this unprecedented momentum of progress. I've noticed that DeepMind has been continuously releasing various achievements recently, including Deep Think, the IMO gold medal, Genie 3, and about fifty other projects, all of which have emerged one after another in the past two months. It's so fast that people tend to forget about them because everything is advancing at breakneck speed. I'd like to hear your general thoughts on this progress and momentum.

Hassabis: Yes, this situation is very exciting. Over the past few years, we've been building up strength and accelerating the pace of releases and R & D. Now we're seeing the results of these efforts. I think this is a very exciting time for the industry. There are new achievements almost every day. Our team is releasing new things almost daily. Even internally, it's hard to keep up, let alone for the entire field. I'm very proud of all this and very satisfied with some of our recent results.

Kilpatrick: So, how do you view Deep Think? One of the things that excites me the most is that a version of this model is now available to subscribers of the Gemini app, allowing people to experience it firsthand. I think the combination of advancing technological R & D while allowing users to directly interact with the product is wonderful. So, from the perspective of Deep Think, how do you think about it?

Hassabis: I think the emergence of "thinking models" can be seen as a return to our early work on game AI, such as AlphaGo and AlphaZero. Since the establishment of DeepMind, we've been developing "agent - based systems". In the early days, this meant that the system could complete a full - fledged task, usually excelling at a game because games have clear goals. At that time, our models were single - domain game models, but now we have powerful multimodal models that can handle language and understand and integrate other information.

In game AI, we need to add the ability of "thinking" or "planning" on top of the model. This is the inevitable path to AGI. When a model has the ability to think, it can further extend to "deep thinking" and even achieve parallel planning — that is, simultaneously exploring multiple lines of thought and then making the best decision to move on to the next action.

There is still a vast space for innovation in this direction, but even in the area of "thinking", the progress is very rapid. Whether it's mathematics, programming, scientific problems, or games, such systems must have the ability to think and plan, rather than simply giving the first answer that comes to mind. The core value of a thinking system is to continuously correct and optimize its own reasoning process.

Kilpatrick: I watched the video "The Thinking Game" before and took notes while watching. I found that the DeepMind team actually started on this path a long time ago, and there are many similarities with the process of using Reinforcement Learning (RL) to solve problems back then. For example, the data bottleneck that AlphaFold once faced is very similar to the current dilemma of lacking expert data in fields like programming. Does this situation give you a sense of deja vu?

Hassabis: Indeed. We firmly chose Reinforcement Learning early on. This was one of the first key decisions we made in 2010, alongside Deep Learning. The Atari project at that time was the first deep reinforcement learning system that could truly complete interesting tasks — it could learn to play Atari games from the 1970s directly from screen pixels and outperform any human player. More importantly, it could play any Atari game "out of the box". This universality demonstrated the potential of the new technology to scale up and have practical value.

Personally, since I played chess as a child, I've thought about how to optimize my thinking process. This also motivated me to study neuroscience to explore how the brain works and use artificial intelligence, a powerful tool, to condense wisdom into a digital form. Of course, existing systems perform very well in some aspects, but they still have deficiencies in some relatively simple tasks, such as high - school mathematics, basic logic, or some specially designed small games. They exhibit a kind of "patchy intelligence" — they are astonishing in some dimensions but easily expose their weaknesses in others.

From Robots to General Assistants: The Multidimensional Potential of Genie 3

Kilpatrick: So, how do you view Deep Think? One of the things that excites me the most is that a version of this model is now available to subscribers of the Gemini app, allowing people to experience it firsthand. I think the combination of advancing technological R & D while allowing users to directly interact with the product is wonderful. So, from the perspective of Deep Think, how do you think about it?

Hassabis: I think the emergence of "thinking models" can be seen as a return to our early work on game AI, such as AlphaGo and AlphaZero. Since the establishment of DeepMind, we've been developing "agent - based systems". In the early days, this meant that the system could complete a full - fledged task, usually excelling at a game because games have clear goals. At that time, our models were single - domain game models, but now we have powerful multimodal models that can handle language and understand and integrate other information.

In game AI, we need to add the ability of "thinking" or "planning" on top of the model. This is the inevitable path to AGI. When a model has the ability to think, it can further extend to "deep thinking" and even achieve parallel planning — that is, simultaneously exploring multiple lines of thought and then making the best decision to move on to the next action.

There is still a vast space for innovation in this direction, but even in the area of "thinking", the progress is very rapid. Whether it's mathematics, programming, scientific problems, or games, such systems must have the ability to think and plan, rather than simply giving the first answer that comes to mind. The core value of a thinking system is to continuously correct and optimize its own reasoning process.

Kilpatrick: Many people were shocked after watching the demonstration of Genie 3. Some even exaggeratedly said that "this is evidence for the simulation theory". It is indeed related to using games to drive the development of reinforcement learning. Looking back at Genie 3, do you think the results are consistent with your initial expectations? I think improving a model's ability to play games doesn't necessarily lead to the current world model.

Hassabis: Genie 3 brings together multiple research paths and ideas. We've always used board games or video games as challenging environments, not only to drive algorithmic progress but also to synthesize data. We build extremely realistic virtual environments to train systems to understand the physical world.

The world model we want to build should not only understand physical structures, material properties, and the flow of liquids but also understand biological and human behavior. Since AGI must understand the physical world to operate in it, this is crucial for robots and also indispensable for general assistant projects like Project Astra (Gemini Live).

One way to verify a world model is to let it generate a virtual world that is consistent with reality. For example, turning on a faucet should result in water flowing out, and you should see your reflection in a mirror. What makes Genie 3 amazing is the consistency of the world it generates. If you turn around and then look back, the world remains the same. This shows that its underlying understanding of physics is quite excellent.

Kilpatrick: How do you think users will use Genie? Is our goal to only use it as a tool to improve Gemini and other robot projects, or do you think it has more uses on its own?

Hassabis: It is exciting in multiple dimensions. First, we are already using it for training. For example, we have a game agent called SIMA (Simulated Agent) that can operate and play an existing computer game out of the box. Sometimes it performs well, and sometimes it doesn't.

Interestingly, we can put SIMA into Genie 3, which is like an AI acting in another AI's "mind". SIMA will issue operation instructions based on a goal (such as finding a key in a room), and Genie 3 will generate the game world in real - time. This can create an infinite amount of training data, which is valuable for robot training or general training of AGI systems.

At the same time, it also has great potential in the field of interactive entertainment. I have many ideas for creating the next - generation games, and it may even give rise to a new form of entertainment that lies between movies and games.

Finally, from a scientist's perspective, the most interesting thing is what this can tell us about the real world, physical laws, and even the simulation theory. When you generate an entire virtual world late at night, you can't help but wonder: What is the nature of the real world? This is also the motivation that has driven me to use AI for scientific research throughout my career. I think models like Veo 3 and Genie 3, when observed from a different perspective, can give us insights into the nature of reality.

The Ability Gap in AI: Powerful Generative Capabilities Coexist with Elementary Mistakes

Kilpatrick: This brings us back to the issue of "patchy intelligence" we talked about earlier. On the one hand, we already have amazing systems that can generate a complete virtual world; on the other hand, I might even be able to beat Gemini in a game of chess, and sometimes it even violates the rules. Recently, we announced that DeepMind is collaborating with Kaggle to launch a "Game Arena" to test models in various games. What do you think?

Hassabis: This reflects a more general problem — today's systems (whether it's Gemini or models from competitors) are very powerful in many aspects: they can generate simulated worlds from text, understand videos, solve math problems, and conduct scientific research. However, anyone who has used these chatbots knows that their ability boundaries can be easily reached.

In my opinion, this lack of consistency is exactly the step they still need to take to achieve full - fledged AGI. An ordinary person should not be able to so easily discover the elementary flaws of a system. We may have solved the kind of elementary problems like "counting the number of 'R's in'strawberry'" that were used to evaluate a model's attention to detail, but there are still tasks that an elementary school student can easily complete while the model fails. This is probably because there is still a lack of key innovation in areas such as reasoning, planning, and memory.

Moreover, many of our existing evaluation benchmarks are approaching saturation. For example, in the AIME math test, Deep Think's recent score has reached 99.2%, leaving almost no room for improvement. This may even mean that the test itself has lost its discriminatory power. Therefore, we need to design newer, more difficult, and more comprehensive evaluations to examine a model's physical intuition, understanding of the world, and safety (such as preventing deceptive behavior).

I'm very looking forward to the "Game Arena" because it continues our original intention of developing game AI. Games are clean testing environments with objective scores and no subjective human grading. They will automatically increase in difficulty as the system's capabilities improve, and more complex games can be continuously introduced. In the future, AI can even create new games on its own and learn from competing with each other, thus avoiding data leakage or overfitting. This multi - agent environment will become one of the important long - term effective evaluation benchmarks.

Internalizing Abilities vs. External Tool - Calling: Experience - Driven Decision - Making

Kilpatrick: My insight over the past two years is that many problems in life are essentially a form of evaluation. Work performance is an evaluation, and the way you view things is also an evaluation. In the field of games, we have clear constraints and objective results, but once we expand to non - game fields, it's very difficult to define the "ground truth". For example, in human daily tasks, how can we build a reinforcement learning environment? How do you think we should capture these features in non - game environments?

Hassabis: Defining the reward function or objective function has always been the biggest challenge for reinforcement learning in real and chaotic environments. In the real world, there is no single objective function; instead, multiple objectives coexist, and the weights of these objectives change constantly depending on factors such as mood, environment, and career stage.

I think future general systems must learn to understand the user's real intention and convert it into a set of reward functions that can be optimized. This involves research on meta - cognition or "meta - reinforcement learning" (meta - RL) — building a system on top of the main system to infer the optimal objective function of the main system. We started trying this kind of research during the game - playing stage of AlphaGo and AlphaZero ten years ago, and it is likely to become a research focus again now.

Kilpatrick: I think we should start working on this now because it seems that what DeepMind did ten years ago is exactly what everyone is chasing at the forefront today. Back to the "trends in thinking" and "trends in games", we've experienced various model expansion paths in history — pre - training, post - training, data expansion, computing power expansion, and later, inference expansion. For example, Deep Think benefits from the improvement of inference ability. Now it seems that "tools" have become a new dimension of expansion. Do you think equipping models with physical simulators as tools will be one of the future directions?

Hassabis: The ability to use tools is one of the most important capabilities of an AI system. The core of a thinking system is that it can actively call tools during the thinking process, such as search engines, mathematical programs, and programming environments, and then adjust the plan based on the results provided by the tools.

Interestingly, in a digital system, it's not as clear as in humans which abilities should be put into the main model (i.e., the "main brain") and which should be used as external tools. For humans, anything that is not part of the body is a tool, but in AI, this boundary is very blurred.

For example, should the ability to play chess be directly built into the main model, or should we call Stockfish or AlphaZero as an external tool? Experience shows that if a certain ability (such as mathematics or programming) can improve the overall reasoning level, it should be put into the main model; but if it may weaken other general abilities of the model, it is more suitable as an external tool. This is entirely an empirical question that needs to be continuously tested and verified in practice.

The Blueprint of AGI's Comprehensive Abilities: Integration of Language, Multimedia, and Physical Reasoning

Kilpatrick: Many developers are now asking that models are no longer the static weights of the past. Instead, they can call various tools during the reasoning process and are becoming more and more like a complete system. This is changing the way people build applications. What do you think of this transformation from "model" to "system"? Do you have any suggestions for developers?

Hassabis: Models are evolving very rapidly, especially when the ability to use tools is combined with planning and thinking abilities. Their potential may expand exponentially because they can combine and use tools in brand - new ways.

I suggest that developers think more about which tools are most valuable for an AI's abilities and then start building these tools. Even with tool - calling and agent capabilities, these systems themselves are not finished products. They still require a lot of productization work. The challenge for product managers and designers is to anticipate the technological state one year from now and design products for that