HomeArticle

Google's latest version of "Deep Research" counterattacks GPT-5.2

新智元2025-12-12 12:05
Google and OpenAI stage a year-end showdown.

Google and OpenAI are now at odds with each other! Currently, the two companies are bombarding each other with various new products.

Last night, OpenAI successfully avenged itself against Gemini 3 with the expert - level GPT - 5.2!

More than an hour before the release of GPT - 5.2, Google took the lead in launching the brand - new Gemini Deep Research Agent.

Google has reimagined the in - depth research of Gemini, making it more powerful than ever.

The new version of the Deep Research Agent is built on Gemini 3 Pro;

Through multi - step reinforcement learning training, it improves accuracy and reduces hallucinations;

It can handle a massive amount of context and provides citation source verification for every point it makes.

In addition to the function update of the Deep Research Agent, two other brand - new capabilities are also introduced:

Open - source the new network research Agent benchmark DeepSearchQA to verify the comprehensiveness of agents in network research tasks;

Launch the brand - new Interactions API.

Although GPT - 5.2 has just been released and cannot be compared yet, Lukas Haas, the product manager of Google DeepMind, revealed on the social platform X:

The latest version of the Gemini Deep Research Agent scored 46.4% on Google's new benchmark test, comparable to GPT - 5 Pro on BrowseComp, but its price is an order of magnitude lower.

Deep research, even "deeper"

Gemini Deep Research is an agent optimized for long - term context collection and synthesis tasks.

The reasoning core of this agent uses the Gemini 3 Pro model, which has the highest factual accuracy to date, and is specially trained to reduce hallucination generation and maximize report quality in complex tasks.

By expanding the application of multi - step reinforcement learning in search, this agent can independently navigate complex information environments with high precision.

Gemini Deep Research has reached a leading level of 46.4% in the complete Humanity's Last Exam (HLE) test set, achieved an excellent score of 66.1% on DeepSearchQA, and obtained a high score of 59.2% in the BrowseComp test.

DeepResearch uses an iterative research planning mechanism - it formulates queries, reads results, identifies knowledge gaps, and searches again.

This version has significantly improved the web search function, enabling it to access specific data on websites in - depth.

This agent is optimized to generate well - researched reports at a lower cost.

Different from traditional chatbots, Deep Research is designed as a long - term running system, and its core competitiveness lies in handling complex "non - immediate" tasks.

A brief talk about deep research

Deep research is the most frequently used function in daily AI tool usage.

After all, for only $20 per month, you can enjoy multiple "doctor - level" services. Why not?

In my opinion, deep research is the most effective AI tool for ordinary people to gain an edge in knowledge services.

The intelligence of deep research like Deep Research does not come from the brute - force calculation of a single model, but from its complex agentic workflow.

This workflow simulates the cognitive behavior of human experts when facing unfamiliar fields and mainly includes four closed - loop stages: planning, execution, reasoning, and reporting.

When a user submits a vague macro - instruction (such as "analyze the commercialization path of quantum sensors in 2030"), DeepResearch first activates its planning module.

Based on the powerful reasoning ability of Gemini 3 Pro, the system does not start searching immediately. Instead, through the "step - back prompt" technique, it breaks down this macro - problem into multiple sub - dimensional research paths, such as technological maturity, supply - chain bottlenecks, policy and regulatory environment, and analysis of major competitors.

This planning process is dynamic. In traditional chain - of - thought reasoning, the path is often linear; while in DeepResearch, the planning tree is expandable.

If an unforeseen new concept is discovered during the initial search, the system will modify the research plan in real - time and add new branches for in - depth exploration.

DeepSearchQA: A benchmark test for deep research agents

In the above - mentioned benchmark test, you should have noticed something called DeepSearchQA.

This is a test benchmark specifically developed by Google for deep research agents, a brand - new benchmark for evaluating the performance of agents in complex multi - step information retrieval tasks.

DeepSearchQA includes 900 manually designed causal - chain tasks covering 17 fields, where each step depends on previous analysis.

Different from traditional fact - based tests, DeepSearchQA evaluates research integrity by requiring agents to generate detailed answer sets, while also testing research accuracy and information recall ability.

DeepSearchQA can also be used as a diagnostic tool for considering the time - benefit ratio.

In internal evaluations, Google found that when agents are allowed to perform more search and reasoning steps, their performance is significantly improved.

Comparing the results of pass@8 and pass@1 proves the value of allowing agents to verify answers by exploring multiple trajectories in parallel.

These results are calculated based on a subset of 200 prompts from DeepSearchQA.

Interactions API: Designed for Agent application development

The Interactions API natively integrates a set of exclusive interfaces designed for Agent application development scenarios, which can efficiently handle the complex context management of interleaved messages, chains of thought, tool calls, and their state information.

In addition to the Gemini model suite, the Interactions API also provides its first built - in Gemini Deep Research Agent.

Next, Google will expand its built - in Agents and provide the function of building and introducing other Agents, which will enable developers to connect the Gemini models, Google's built - in Agents, and developers' customized Agents through a single API.

The Interactions API provides a single RESTful endpoint for interacting with models and Agents.

The Interactions API extends the core function of generateContent and provides the necessary features for modern agent applications, including:

Optional server - side state: The ability to offload history management to the server. This simplifies client - side code, reduces context management errors, and may reduce costs by increasing cache hit rates.

Interpretable and composable data model: A clear architecture designed for complex agent histories. You can debug, manipulate, stream, and perform logical reasoning on interleaved messages, thought processes, tools, and their results.

Background execution: The ability to offload long - running inference loops to the server without maintaining a client connection.

Remote MCP tool support: The model can directly call the Model Context Protocol (MCP) server as a tool.

With the launch of the Interactions API, Google is trying to redefine the way developers build AI applications, shifting from the "stateless request - response" mode to the "stateful agent interaction" mode.

Most current LLM APIs are stateless. Developers must maintain the entire conversation history on the client side and send tens of thousands of tokens of context back to the server with each request.

This not only increases latency and bandwidth costs but also makes it extremely cumbersome to build complex, multi - step Agents.

The Interactions API introduces server - side state management.

Developers only need to create a session through the /interactions endpoint, and Google's server will automatically maintain all the context of the session, tool call results, and the agent's internal thought state.

This is what I think is the terrifying part of Google's latest API.

The most revolutionary feature of the Interactions API is that it allows developers to directly call Google's pre - trained advanced Agents, not just the base models.

For example, developers can embed Google's top - notch research capabilities into their ERP, CRM, or scientific research software through a simple API call (specifying agent = deep - research - pro - preview - 12 - 2025).

Considering that a single DeepResearch task may consume hundreds of thousands of tokens in reading and generation, the cost of a single deep research may reach several dollars.

However, compared with the cost of a human junior analyst working for hours or even days, this price still has a very high return on investment.

DeepMind reaches a cooperation with the UK government

Finally, there is another piece of news worth noting.

While Google and OpenAI are in a fierce competition, Google DeepMind has started cooperation at the national level.

As an AI giant born in London, DeepMind is conducting an unprecedented "AI - governed country" experiment with the UK government through DeepResearch and its underlying technologies.

This cooperation not only involves scientific exploration but also penetrates into the capillaries of public administration, especially making breakthrough progress in solving the UK's long - standing housing crisis and low planning efficiency problems.

Project Extract: Cracking the "data silos" in urban planning

The UK's urban planning system has long been regarded as a bottleneck hindering economic growth and housing construction.

Each year, local councils need to process about 350,000 planning applications, and a large number of historical planning archives still exist in the form of paper, scanned PDFs, or hand - drawn maps.

Planners often have to spend hours searching through dusty archives to find the boundaries of underground pipelines or protected areas demarcated decades ago.

To solve this pain point, DeepMind has cooperated with the UK government's AI incubator (i.AI) to develop the Extract tool.

This is not a simple OCR software but a complex geospatial intelligence system based on Gemini's multi - modal reasoning ability.

Unstructured information understanding:

Extract first uses Gemini's visual - language ability to read low - quality scanned documents. It can not only recognize text but also understand the semantics of hand - written annotations (for example, identifying the "approval date" instead of the "application date" in marginal notes), with a date recognition accuracy rate of 94%.

Visual reasoning and polygon extraction: