Next-Generation Agentic Deep Research Subverting Search Engines Proposed Jointly by 12 Top Academic Institutions

In the era of information explosion, traditional keyword searches can hardly meet complex knowledge needs. The latest research proposes Agentic Deep Research, which is driven by large language models. It can automatically plan retrieval paths, obtain evidence through multiple rounds of iteration, guide search decisions through logical reasoning, and output answers at the level of research reports. It may completely subvert the traditional search paradigm.

In the era of information explosion, we search, ask questions, and obtain answers every day. But have you ever wondered: Can traditional search really meet our increasingly complex knowledge needs?

At the just - concluded WWDC conference, Apple publicly explored integrating AI assistants like ChatGPT into the system layer for the first time, shaking up the long - bound default search engine, Google!

This is not only a product revolution but also a power shift in the information entry.

Meanwhile, the market share of traditional search giants is showing a downward trend, while the daily active user numbers of intelligent assistants based on large models, such as ChatGPT, Claude, and Perplexity, continue to rise.

These signals indicate a clear trend:

The way we obtain information is shifting from "keyword search + manual screening" to "asking questions → automatic research → drawing conclusions."

Against this backdrop of change, a recent paper jointly published by top institutions such as UIC, UIUC, Tsinghua University, Peking University, UCLA, and UCSD proposes Agentic Deep Research: a deep information acquisition and reasoning system driven by large language models, which may completely overturn the traditional search paradigm.

Paper link: https://arxiv.org/pdf/2506.18959

Project homepage: https://github.com/DavidZWZ/Awesome-Deep-Research

Entering the "Agentic Deep Research" Era

In the past, search engines relied on keyword matching.

Today, LLMs like ChatGPT and Claude have changed the way we interact with answers. However, these models still struggle with complex "deep research - type tasks" that require multi - step reasoning and cross - domain integration.

At the beginning of 2025, OpenAI first proposed the concept of "Deep Research" in an official update and described it as follows:

Introducing Deep Research: An agent that uses reasoning to synthesize large amounts of online information and complete multi - step research tasks for you.

On this basis, the researchers' proposed Agentic Deep Research (Intelligent - type Deep Research) further systematizes and technologizes this concept: The LLM becomes an autonomous information research intelligent agent with a closed - loop ability of reasoning - searching - synthesizing.

Agentic Deep Research includes automatically planning retrieval paths, iteratively obtaining evidence in multiple rounds, guiding search decisions through logical reasoning, and fusing multi - source information to output research - report - level answers.

The goal of Agentic Deep Research is to move from "answering a question" to "systematically completing complex tasks like a researcher."

From Keyword Matching to Intelligent Deep Research

As the cornerstone of modern knowledge acquisition, information retrieval has long relied on traditional keyword - matching search engines (such as Google and Bing).

Such systems rely on web crawling, index building, and static sorting mechanisms and are good at handling factual or navigational queries.

However, when faced with complex cross - domain and highly inferential questions, they lack context understanding and multi - step integration capabilities, often requiring users to manually screen fragmented results and construct conclusions on their own, causing a huge cognitive burden.

With the rise of large language models (LLMs), information retrieval has entered a new stage driven by "language understanding." Q&A systems based on LLMs like ChatGPT and Claude have broken through the keyword limitation and can directly generate answers through natural language dialogue, significantly improving interaction efficiency.

However, such purely parameter - memory - based generative models still have two major flaws: one is that the timeliness of knowledge is limited by the time range of training data, and the other is the "hallucination" problem, where the output content may lack real - world basis.

To alleviate the above problems, Retrieval - Augmented Generation (RAG) emerged. RAG enhances the accuracy and breadth of answers by retrieving external knowledge bases before generation and introducing factual evidence.

This paradigm shows significant advantages in tasks such as factual Q&A and open - domain QA, representing the first integration of information retrieval and generation.

However, the current mainstream RAG still mostly uses a static, one - round "retrieval - generation" process and performs poorly when faced with questions that require step - by - step thinking and dynamic planning, unable to effectively simulate the research process of human experts who "think while researching materials."

To break through this limitation, the latest research proposes a new Agent paradigm called Deep Research. This paradigm endows the LLM with human - like "researcher" capabilities, enabling it to autonomously plan search paths, dynamically initiate query requests, iteratively reason and analyze, and complete comprehensive deep information synthesis with the help of external tools when faced with complex tasks.

In this framework, retrieval and reasoning are no longer isolated modules but form an alternately collaborative feedback closed - loop, truly simulating expert - style research behavior.

Therefore, from traditional Web Search → LLM Chatbot → LLM with RAG → Agentic Deep Research, we are witnessing a profound leap in the information acquisition paradigm - from "static lookup" to "intelligent research."

Double Support from Benchmark Results and TTS Law

In large - scale empirical evaluations, the researchers compared 5 general LLMs (such as GPT and Claude - 3.5), 4 LLMs emphasizing reasoning ability (such as DeepSeek - R1 and OpenAI O1), and 1 typical Agentic Deep Research model (OpenAI Deep Research intelligent agent) using three challenging benchmarks: BrowseComp, BrowseComp - ZH, and Humanity’s Last Exam (HLE).

The results show that the accuracy rate of standard LLMs in the BrowseComp series is usually less than 10%, and it is difficult to exceed 20% in HLE;

while the Deep Research intelligent agent with a reasoning - retrieval closed - loop achieved significant advantages of 51.5%, 42.9%, and 26.6% respectively, fully verifying the gain effect of "reasoning - driven retrieval" on complex tasks.

Meanwhile, the paper statistically analyzed the star - marking trends of public GitHub repositories and found that the star - marking curves of projects such as DeepResearcher, R1 - Searcher, and DeerFlow have been significantly steeper than those of traditional RAG libraries since the beginning of 2025, indicating the high attention and rapid iteration ability of the community towards this paradigm.

More importantly, these performance leaps are mutually verified by the Test - Time Scaling Law (TTS Law) proposed by the authors.

By statistically analyzing the experimental data from the AIME24 mathematical reasoning set and the MuSiQue multi - hop Q&A set, the paper found that when increasing the number of reasoning steps or extending the retrieval rounds, the scores of the models on their respective tasks showed nearly linear gains, forming a clear diagonal gain plane in the three - dimensional coordinate system.

This law not only explains why the Deep Research intelligent agent can significantly outperform single - round RAG and pure - reasoning LLMs in benchmarks such as BrowseComp/HLE but also provides an operable budget allocation criterion for system implementation:

Fact - intensive queries tend to allocate more tokens for retrieval, while logic - intensive questions require sufficient reasoning depth to obtain optimal performance under a fixed cost.

In summary, the significant improvement in benchmark results proves the effectiveness of Agentic Deep Research, and the TTS Law reveals its predictable growth mechanism;

They complement each other and lay a solid theoretical and empirical foundation for building efficient, controllable, and cost - quantifiable deep research intelligent agents in the future.

The Open - Source Ecosystem is also Focusing on this Direction

Meanwhile, Agentic Deep Research not only outlines the blueprint for the next - generation information retrieval conceptually. In addition to increased investment from large companies such as OpenAI and Google, it has quickly formed a wide - ranging consensus and practical response in the academic and open - source communities.

In terms of research popularity, a large number of papers on topics such as "reasoning - enhanced retrieval," "deep research agent," and "reinforcement learning search agents" emerged in 2025. Representative works include DeepResearcher, Search - R1, and R1 - Searcher, which systematically promote the evolution of information acquisition technology driven by reasoning ability.

These studies are no longer satisfied with the fixed processes under traditional supervised learning. Instead, they use reinforcement learning, environmental interaction, and task feedback mechanisms to endow language models with the ability to autonomously explore, plan strategies, and make dynamic corrections.

More notably, a prosperous ecosystem has quickly formed in the open - source community.

Several deep - research intelligent agent systems, such as deepresearch, DeerFlow, and ODS (Open Deep Search), have received thousands of GitHub stars in a short time, reflecting the wide - spread attention and enthusiasm of developers and researchers.

According to the statistical analysis of open - source trends in the paper, the Agentic Deep Research projects as a whole show a continuously rising star - marking growth curve and are leading traditional RAG projects during the same period.

This trend not only shows the strong technological appeal of this paradigm but also indicates that the entire community is forming a virtuous cycle driven by products, fed back by research, and co - built by the community.

Therefore, whether from the perspective of model - ability breakthroughs, clarity of technological paths, or the activity of the ecosystem, Agentic Deep Research is in a critical leap - forward stage from a cutting - edge theory to a mainstream paradigm, indicating that the era of "letting AI complete research tasks" is not far away.

Evolutionary Route to "AI Researchers"

The paper also proposes several key cutting - edge issues, including human - in - the - loop supervision mechanisms, cross - modal multi - source information fusion, multi - agent collaborative research systems, efficient reasoning and search with adaptive token budget control, and vertical - domain deep - research systems for law, biology, and medicine.

This is not only an evolution of the search paradigm but also a reshaping of the way humans interact with information in the LLM era.

Reference materials:

https://arxiv.org/pdf/2506.18959

This article is from the WeChat official account "New Intelligence Yuan". Author: New Intelligence Yuan. Editor: LRST. Republished by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Subverting search engines, the next-generation Agentic Deep Research was proposed jointly by 12 top academic institutions.

Entering the "Agentic Deep Research" Era

From Keyword Matching to Intelligent Deep Research

Double Support from Benchmark Results and TTS Law

The Open - Source Ecosystem is also Focusing on this Direction

Evolutionary Route to "AI Researchers"