Popular "Agentic Reasoning": What It Is, How to Use It, and Future Opportunities

Chart the course for the next generation of adaptive collaborative intelligent agents

Reasoning is the core of intelligence. With this ability, it becomes possible for artificial intelligence (AI) models to perform tasks such as logical inference, problem - solving, and decision - making in open and dynamic environments.

In the current paradigm shift from "words to actions", large language models (LLMs) are no longer passive sequence generators. Instead, they are required to evolve into autonomous reasoning agents (Agents) that can plan, act, and learn in real - time during continuous interactions. Therefore, Agentic reasoning has become one of the hot research directions in the current AI large - model industry.

Recently, a review article titled "Agentic Reasoning for Large Language Models" went viral on X. This review systematically traced the evolutionary context of Agentic reasoning and pointed out the direction for the next - generation adaptive collaborative agents. The research team is from the University of Illinois at Urbana - Champaign, Meta, Amazon, Google DeepMind, the University of California, San Diego, and Yale University.

Paper link: https://arxiv.org/pdf/2601.12538

This review is 135 pages long, covering different levels such as "basic Agentic reasoning", "self - evolving Agentic reasoning", and "collective multi - agent reasoning", two key optimization modes of context reasoning and post - training reasoning, as well as the applications of Agentic reasoning in real - world scenarios such as science, robotics, medicine, automated scientific research, and mathematics, and the future open challenges and research directions, etc.

Figure | Overview of Agentic reasoning

If you are in the large - model industry or are very interested in the development trends of large models, a deep understanding of the current situation and future development of Agentic reasoning may be helpful to you. Share it with you!

Three levels of Agentic reasoning

In the paper, the research team defines "Agentic reasoning" as follows:

Regard reasoning as the core mechanism of agents, covering basic capabilities (planning, tool use, and search), self - evolving adaptation (feedback and memory - driven adaptation), and collective collaboration (multi - agent cooperation). These capabilities can be achieved through context orchestration or post - training optimization.

Different from the static question - answering mode of traditional LLMs, Agentic reasoning emphasizes the continuous interaction between the model and the environment. The reasoning process of traditional LLMs is usually a one - time event. The user inputs a question, and it generates a text answer based on static training data. The whole process lacks real interaction with the external environment. This mode performs well in closed and structured tasks, but it cannot execute actions to verify ideas or improve itself during interactions.

Agentic reasoning completely breaks through this limitation. It redefines LLMs as agents that can autonomously perceive, plan, act, and learn in the real world. The essential difference lies in that agents gain the ability to continuously interact with the environment in the time dimension. They need to handle uncertainties, learn from feedback, and even cooperate with other agents to complete complex tasks without standard answers in open and dynamic scenarios (such as scientific experiments, robot control, and financial market analysis).

Figure | Differences between traditional LLM reasoning and Agentic reasoning.

The ability of Agentic reasoning is not achieved overnight. Instead, it gradually progresses along a clear path as the environmental complexity and task requirements increase, including the following three levels:

Level 1: Basic Agentic reasoning

At this level, agents have the basic ability to complete complex tasks in a relatively stable environment. They achieve their goals through task decomposition (planning), invoking external tools (such as calculators, databases, APIs), and active search (such as web retrieval), and can verify results and adjust steps. For example, asking an LLM to write code, run and debug it, or obtain real - time information through a search engine and compile a report. This level is the foundation of Agentic reasoning, focusing on how to act effectively within a known framework.

Level 2: Self - evolving Agentic reasoning

When the environment starts to change and tasks become uncertain, agents need to learn from experience. The mechanisms of feedback integration and memory - driven adaptation enable agents to cope with the ever - changing environment. A reflection - based framework (such as Reflexion) allows agents to criticize and optimize their own reasoning processes, while reinforcement learning methods (such as RL - for - memory) formalize the formation and extraction of memory as strategy optimization. Through these mechanisms, agents can dynamically integrate the reasoning process with the learning process and gradually update their internal representations and decision - making strategies without complete retraining. This continuous adaptation mechanism closely links reasoning with learning, enabling the model to accumulate capabilities and achieve cross - task generalization.

Level 3: Collective multi - agent reasoning

Collective multi - agent reasoning expands agents from isolated problem - solvers to a collaborative ecosystem. Multiple agents no longer work independently but cooperate through clear role assignment (such as "manager - executor - supervisor"), communication protocols, and a shared memory system to achieve common goals. When agents specialize in their tasks and optimize each other's outputs, the collaborative mechanism can significantly improve the diversity of reasoning, enabling the system to conduct debates, resolve differences, and reach consensus through multi - round interactions based on natural language. However, this complexity also brings challenges in terms of stability, communication efficiency, and credibility. Therefore, a structured coordination framework and strict evaluation criteria need to be established.

Two system optimization modes of Agentic reasoning

In the process of building an Agentic reasoning system, whether it is the construction of basic capabilities, self - evolving adaptation, or multi - agent cooperation, all system constraints and optimization settings can ultimately be attributed to two complementary system optimization modes: context reasoning and post - training reasoning.

Among them, the core of context reasoning is not to change the model parameters but to focus on computational expansion during inference. In this mode, instead of modifying the model weights, more computational resources are invested during the inference stage (i.e., test - time) to expand the agent's capabilities.

This mode emphasizes that through structured orchestration, search - based planning, and adaptive workflow design, agents can dynamically handle complex problem spaces without changing their internal parameters. It transforms the reasoning process from a static "one - time prediction" to a dynamic cycle of "thinking" and "doing".

Different from context reasoning, post - training reasoning aims to modify model weights and focuses on the internalization of capabilities. Its goal is to transform successful reasoning patterns into part of the model parameters, thereby solidifying them as the model's "instinct".

Using reinforcement learning (RL) and supervised fine - tuning (SFT), agents learn when to invoke tools, how to make plans, and how to verify results under the guidance of massive interaction data or specific reward signals. This kind of training enables the model to more directly and efficiently mobilize internal knowledge when facing similar problems without cumbersome trial - and - error searches during testing.

Real - world applications of Agentic reasoning

Agentic reasoning is reshaping the way of solving complex problems by decomposing complex tasks into planning, tool invocation, search, and feedback loops.

Figure | Overview of the applications of Agentic reasoning.

1. Mathematical exploration and code generation

Different from traditional pure - model reasoning, existing systems such as OpenHands can write code, execute tests, and debug errors by integrating programming environments, thus obtaining accurate feedback in a stable environment. In mathematical exploration, agents can transform complex logical reasoning into verifiable program outputs through the "thinking - coding - execution" cycle; in the field of code, this ability evolves into "Vibe Coding", where agents handle cumbersome syntax while human developers focus on high - level logic construction.

2. Scientific discovery

In fields such as materials science, biology, and chemistry, agents can independently design experiments, run simulations, and analyze massive amounts of data. For example, by planning complex experimental processes, agents can iteratively optimize molecular structure design or read thousands of research papers and finally generate reviews. This "autonomous research" mode not only expands the scale of scientific research but also makes the integration and verification of interdisciplinary knowledge possible.

3. Embodied agents

Embodied agents need to transform natural language instructions into physical actions of robots. In this process, reasoning is not only logical but also spatial and physical. Agents need to combine visual perception (such as object recognition and scene layout) with motion planning to achieve goal navigation, object manipulation, and cooperation with humans in a dynamic environment. This closed - loop perception - decision - feedback mechanism is the key for robots to achieve general operation capabilities.

4. Healthcare

In the high - risk field of healthcare, Agentic reasoning is used to assist in diagnosis, drug discovery, and the formulation of personalized treatment plans. By accessing the latest medical databases and clinical guidelines, medical agents can integrate patients' multi - modal data (such as medical records and images) and provide evidence - based reasoning paths. More importantly, multi - agent systems can simulate the consultation process of doctors from different departments, improve the accuracy of diagnosis and the robustness of treatment plans through debates and cooperation, and use retrieval tools to ensure that the knowledge is up - to - date.

5. Autonomous web exploration and research

Facing the vast and dynamic Internet, agents have the ability to autonomously browse web pages and extract information. They can operate browsers like humans, read web content, fill out forms, and even autonomously evaluate the credibility of information. This ability is widely used in market research, competitor analysis, and the automatic generation of in - depth industry research reports. Through long - term task planning and memory mechanisms, agents can handle complex tasks that require multiple rounds of searches and cross - page reasoning.

Future challenges of Agentic reasoning

Despite the broad application prospects, multiple obstacles need to be overcome to build truly intelligent, reliable, and safe agent systems.

1. Personalization

Most current agents optimize "average performance" and ignore individual differences among users. The future challenge lies in how to enable agents to quickly capture and adapt to users' unique preferences, workflow styles, and feedback habits through reasoning. It is not just about remembering some facts but adjusting their reasoning logic and decision - making styles to truly achieve personalized services for each individual.

2. Long - term interaction

In the real world, tasks often span days or even months. How to maintain focus on tasks, ensure the coherence of memory, and handle interruptions and changes during task execution over a long time span is a huge challenge. The existing context window limitations and memory management mechanisms are still difficult to support such ultra - long - term interactions without forgetting or logical drift.

3. World modeling

To plan and act more robustly, agents need to build an internal "world model", that is, an accurate understanding of the physical laws, causal relationships, and dynamic changes of the environment. This can not only help agents make better decisions in unknown environments but also predict the consequences of actions through simulation, improving safety.

4. Multi - agent training

The capabilities of a single agent are limited, but training hundreds or thousands of agents to work together faces challenges such as scalability, credit assignment, and communication efficiency. How to design an efficient training framework to enable multiple agents to develop collective intelligence during cooperation instead of falling into chaos or inefficient cycles is a key technical barrier to the next - generation agent systems.

5. Governance framework

When agents are given the permission to execute actions, the security risks increase exponentially. How to build an effective governance framework to ensure that agents' behaviors conform to human values, prevent abuse, and enable timely intervention and accountability in case of errors is the primary prerequisite for the large - scale implementation of agents in the real world.

This article is from the WeChat official account “Academic Headlines” (ID: SciTouTiao), author: Wang Yueran. Republished by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。