AI Agents in H1 2025: Changes and Opportunities

Model competition, application boom, commercialization

In the first half of 2025, AI Agents witnessed rapid development, igniting the craze of "everything can be an Agent".

This craze was first reflected in the intense "arms race" in the model field at the technical foundation level. At the beginning of the year, DeepSeek disrupted OpenAI's dominance in the inference model track, shocking the industry. Subsequently, leading companies such as OpenAI, Anthropic, and Google took turns to launch significant models like o3 Pro, Claude 4 series, and Gemini 2.5 Pro.

The leap in model capabilities directly drove the explosion at the application level. With OpenAI's release of Operator (for online task execution) and Deep Research (for in - depth research), the competition in the AI Agent track intensified suddenly, and new products emerged continuously.

Large companies are betting on the Agent track: Google plans to release Project Mariner, which can operate browsers and other software, this year. Baidu launched the "Xinxiang" APP, positioning it as a general super - intelligent agent. Alibaba's "Xinliu" project is delving into the human - machine collaboration efficiency of Agents. However, key issues such as PMF (Product - Market Fit), commercialization implementation paths, and core product barriers still await further exploration by the industry.

AI Agent is the third stage in the development of AI applications after prompts and workflows. The core value of AI Agent lies in its abilities to perceive the environment, make autonomous decisions, and use tools (Tool Use). We believe that continuous iteration driven by reinforcement learning will be the key path for the development of Agents to achieve real breakthroughs and address the aforementioned challenges.

Last Sunday, around the entrepreneurial craze, technological breakthroughs, and development trends of AI Agents in the first half of 2025, Liu Pengqi, an executive director of Fengrui Capital, and Yan Qianhang, a vice - president of Fengrui Capital, had an in - depth discussion in a live broadcast. The questions they discussed included but were not limited to:

How to understand the concept of AI Agent? What are the industry's consensus and differences in this track?

What technological breakthroughs have occurred in AI applications? Why do industry insiders generally have high expectations for reinforcement - learning - driven Agents?

What are the core ideas in the "AI Bible" "The Bitter Lesson"? What inspirations do these ideas have for the development of AI Agents?

How can Agents be implemented? What are the innovation opportunities in this process? What will be the long - term barriers in the future?

We edited part of the live - broadcast content, hoping to bring new perspectives for thinking.

/ 01 / What unexpected events have occurred in the AI field in the past six months?

Yan Qianhang: From the popularity of DeepSeek at the beginning of the year to the emergence of Agent applications now, what unexpected events have occurred in the AI field in the past six months?

Liu Pengqi: In the first half of this year, after the release of DeepSeek, the entire AI track accelerated significantly, and both the model side and the application side witnessed key changes.

First, on the model side, inference models represented by DeepSeek quickly opened up the market, prompting major companies to accelerate their entry into the field, and the industry entered an "arms race" state. The more far - reaching significance of DeepSeek is that inference models based on reinforcement learning have entered the public eye, officially opening up a new track for large models.

In addition to the breakthroughs at the product level, the model iteration speed also far exceeded expectations: OpenAI launched o3 Pro, Anthropic released the Claude 4 series, and Google released Gemini 2.5 Pro. Leading companies took turns to "compete", completely shattering the previous prediction of "slowing model iteration". Meanwhile, some companies are regrouping. For example, Meta recently announced a $15 billion investment in the data - labeling startup Scale AI and reorganized its AI department.

It is worth noting that DeepSeek has proven that there is no significant gap between domestic and foreign large - model technologies. Large companies are also accelerating their layout at the model level. For example, Alibaba released Tongyi Qianwen 3.0, and ByteDance released Doubao 1.6. Although some of the "Six AI Dragons" companies in China (Zhipu, MiniMax, Yuezhianmian, Jieyuexingchen, Baichuan Intelligence, Lingyiwanwu) are a bit behind, the iteration speed of their leading products is still rapid.

Second, the landmark event on the application side was that OpenAI successively released Operator (an Agent for simple task execution) and Deep Research (an Agent for in - depth research) at the beginning of this year. Therefore, 2025 is regarded by the industry as the "Year of AI Agents".

In this wave of AI Agent entrepreneurial craze, Chinese teams are frequently seen: Agent products such as Manus and Genspark have attracted extensive discussions and attention. Large - model companies like Minimax and Yuezhianmian have also joined the fray and released their own Agent products.

Third, the AI programming track has verified PMF, that is, the products match the user needs. The popular tools Cursor and Windsurf were acquired by OpenAI, and the rapid development of companies such as Lovable, Replit, and Bolt have all become hot - topic events in the industry.

Based on these, we can see that in the AI field, the entire market and track are in a craze.

Yan Qianhang: The breakthrough in model inference ability is another highlight in the first half of the year. The industry's focus is shifting from the Scaling Law of "pre - training" (data scale effect) to the Scaling Law of "post - training".

Pre - training refers to improving the basic capabilities of the model through parameters, data, and computing power. Post - training optimizes the model performance through technical means such as reinforcement learning and human feedback. Previously, the effect of the Scaling Law mainly meant continuously investing in parameters, data, and computing power to obtain more and more powerful models.

The turning point occurred when the DeepSeek team launched the R1 model, which applied reinforcement learning technology on a large scale in the post - training stage of the model. Even with very little labeled data, it can improve the model's inference ability and then achieve the Scaling Law of inference performance.

There is an interesting phenomenon in the application aspect. Tech giants such as OpenAI, Google, and Microsoft have all entered the Agent field. Some even believe that OpenAI can essentially be regarded as an "AI Agent company driven by language models".

Previously, we thought that AI applications needed to maintain a certain distance from model companies. Otherwise, when the boundaries of the model were not clear, the applications might be submerged by rapid iterations. However, in this wave of Agent craze this year, some companies mainly focused on models have occupied a place in the application market because of their excellent performance in user - experience delivery.

Currently, there is a craze of "everything can be an Agent" in the market. The involvement of large companies has pushed the model side into a "national arms race". Gemini 2.5 proposed the concept of AIOS (Large - model intelligent agent operating system, which embeds a large - language model into the operating system OS as the brain). The competition between the "Six Dragons" in China and large companies has become white - hot. On the application side, companies like Cursor are promoting and verifying the application of Agents in existing scenarios.

Liu Pengqi: This war is far from over. Large - model companies are developing their own applications and Agent products, and many startups are also doing the same. The boundaries between models and applications are becoming increasingly blurred, and it remains to be seen who is more likely to win in the long run.

Looking back on the first half of this year, new things may happen every day, and many conclusions are quickly disproven. Many of our current views may not be correct. This is a process of staying open - minded and continuously learning.

/ 02 / The three evolutions of AI applications: Where does the Agent paradigm come from?

Yan Qianhang: What is the specific definition of "AI Agent"? What are the essential differences between different applications?

Liu Pengqi: Since OpenAI released ChatGPT at the end of 2022, boosting AI applications into a new track, AI applications have roughly three ways to handle tasks:

The first stage is the prompt (dialogue interaction) form. Users input prompts, put forward requirements, and the large model directly outputs answers. This is the most basic and common form of AI application.

The second stage is the AI Workflow form. The large model accesses external data sources and completes task requirements in multiple steps through pre - defined nodes and paths by humans. Compared with the first stage, Workflow adds data reading and processing links, but still relies on the fixed processes preset by experts. Although the process is controllable, it lacks flexibility and generality. Currently, most applications with good implementation and commercialization are based on this form, such as Dify (providing a low - code development platform to support the rapid construction of marketing copy and user - portrait analysis), Coze (intelligent customer service, voice assistant), and LangFlow (a low - code, visual AI application construction tool).

With the release of Operator and Deep Research by OpenAI, AI applications have entered the third stage - AI Agent (intelligent agent). Its general definition is "an intelligent system that can autonomously perceive the environment, make autonomous decisions, execute tasks, and achieve goals". This can be understood by disassembling the keywords one by one:

"Perceiving the environment" allows AI to more comprehensively understand users' needs, instructions, and contextual information, including long - term memory. At the same time, AI can further change the environment, which depends on the key breakthrough in Tool Use ability during the "task execution" process.

"Autonomous decision - making and planning". Different from Workflow, which relies on the fixed processes preset by experts, Agents can make autonomous decisions on task steps. Although Workflow has an advantage in controllability, it has limitations in flexibility, generality, and generalization ability. Agents with autonomous decision - making ability, although still facing challenges in task - execution success rate, show far - exceeding - expected potential. The combination of these characteristics has pushed the Agent application form in the third stage into the public eye.

/ 03 / How do Tool Use and reinforcement learning empower Agents?

Yan Qianhang: Combining what Pengqi mentioned, the core features of Agents lie in their abilities to perceive the environment, make autonomous decisions, and use tools. Then, compared with AI applications represented by ChatGPT, where exactly are the core advantages of Agents? Which specific tracks are more suitable for implementation and application currently, and what challenges exist?

Liu Pengqi: The core change of Agents this year is the breakthrough in Tool Use ability.

Specifically, from programming to browser - use (Agents simulating users' operations in browsers), then to computer - use (Agents controlling computer systems), and with the increasing popularity of the MCP general interface (Model Context Protocol, which realizes seamless docking between AI models and external resources by formulating unified specifications), the Tool Use ability of Agents has been enhanced, enabling them to obtain information from the outside more efficiently.

Previously, the core limitation of large models in world knowledge was that the training data only included public data up to a certain date, lacking timely data and private - domain data injection. After having the Tool Use ability, AI can autonomously retrieve information and interact with the outside world, and its information - acquisition ability has been improved by an order of magnitude compared with the previous versions.

Now, Agents have verified PMF in the development and programming track. Tools represented by Cursor have proven that some closed - loop operations in the programming field can be completely completed by Agents. More importantly, with the technological breakthrough of large models in reinforcement learning this year, the inference ability has been significantly improved, further enhancing the practicality of Agents.

Yan Qianhang: Let me add why Agents can be successfully implemented in the AI programming track first. Programming is essentially a combination of "text + language data", and its training data is highly structured. Therefore, when ChatGPT was first launched, it showed strong code - generation ability. However, the early code often had hallucination problems and could not be directly connected to the compiler for running and verification.

By integrating the mature software - development toolchain over the past two or three decades, AI programming can form a complete closed - loop system in the process from code writing, debugging to compilation and output, and run independently in a virtual - computer environment, thus providing strong support for the efficient iteration and experimental verification of Agents.

In contrast, the implementation of embodied - intelligence scenarios is more difficult. The core difficulty is that robots need to directly interact with the physical world, and there is a significant gap between code instructions and actual execution. It is difficult for Agents to make rapid breakthroughs in the field of embodied intelligence only through model - level iteration.

Tool Use has helped Agents. Then, what kind of development will reinforcement learning bring to Agents?

Liu Pengqi: The starting point for the implementation of this round of Agents is indeed the improvement of Tool Use ability, but in the future, it still needs to rely on reinforcement learning for further development. In my opinion, Agents iterated based on reinforcement learning are the path for future AI applications to move towards "ultimate intelligence".

In fact, the concept of "Agent" originated from the field of reinforcement learning. In the classic textbook "Reinforcement Learning: An Introduction", an Agent is defined as "an entity that performs actions in an environment and adjusts its behavior according to environmental feedback to achieve long - term goals", which is highly consistent with the concept of Agents discussed in current AI applications.

"Reinforcement learning" originated from computer science and has since interacted with disciplines such as cognitive science, psychology, and neuroscience. It not only represents the path of iteration and evolution in the field of computer science but is also one of the universal laws of evolution.

The evolution of large models, including reinforcement learning, can also be divided into three stages. Taking a daily - life example, students going to school, attending classes is similar to the "self - supervised imitation learning" of large models (the pre - training stage based on a large amount of public unlabeled data); teachers explaining example problems is "supervised fine - tuning" (supervised training based on specific labeled data); and getting feedback through doing homework and taking exams and truly mastering knowledge is typical "reinforcement learning" (using a reward model to guide the training of the basic model). This law also applies to biological evolution. For example, the gene combination of each species is an Agent in a different environment, and it also needs to become stronger through the evolutionary process of survival of the fittest.

The reason why the programming field can quickly verify the value of Agents is that it has a clear data - feedback closed - loop environment. Whether the code is correct or not is easy to verify, and there are very clear reward signals, so the Agent's ability can be iterated quickly.

In the future, if Agents want to surpass competitors or even human intelligence, they must enter the closed - loop of reinforcement learning and autonomously explore learning methods rather than relying on human guidance.

Yan Qianhang: In the past, reinforcement learning has been explored in many fields such as robotics and game AI and has become one of the basic methods to promote the development of AI.

OpenAI early on developed robotics and game - AI applications through reinforcement learning. When the basic performance of large - language models is strong enough, we will find that reinforcement learning plays a key role in improving the upper limit of the model's ability. In other words, reinforcement learning can release its maximum value only after the basic model has certain capabilities.

Taking tennis as an analogy, the coach must first teach the basic racket - swinging movements, which can be continuously optimized and iterated through practice. If the basic movements are not mastered or are incorrect, a large amount of reinforcement training may instead solidify the mistakes, affect performance, and limit the upper limit. Therefore, the final upper limit of the model's ability is determined by both the performance of the basic model and the reinforcement - learning ability.

Therefore, before using reinforcement learning to develop Agents, developers need to consider two questions. First, does the Agent conform to the law of "first having good basic performance and then improving the upper limit through reinforcement learning"? Second, when will the industry enter the critical stage where "reinforcement learning brings a huge boost to Agents"?

Liu Pengqi: From the current observation, although many companies have released their own Agents, a closer look at the technical documents reveals significant differences in their paths, which can be roughly divided into two forms:

The first is the completely end - to - end, reinforcement - learning - trained Agent, represented by OpenAI's Deep Research and Kimi's Researcher. Manus is a typical example, and currently, it seems more suitable for general - purpose tasks with a breadth - first approach. "End - to - end" means that the entire process of the model, including context understanding, tool invocation, and multi - step thinking chain, is completed within an overall framework. Currently, only model companies have such capabilities.

The second is the modularly split Agent, which disassembles different capabilities to different models or Agents to jointly complete a task within an engineering framework. This modular approach currently seems more suitable for general - purpose tasks with a breadth - first approach. In this framework, for example, the decision - making and reasoning part can use a model like DeepSeek R1, and the programming part can use the Claude model. Reinforcement learning mainly acts on the improvement of the single - point capabilities of each module, and finally, external engineering is used for connection to achieve stronger overall performance.

Yan Qianhang: Currently, reinforcement learning has achieved results in improving single - point capabilities, but there is still a need for a breakthrough to achieve end - to

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

What changes and opportunities will there be in the field of AI Agents in the first half of 2025?

/ 01 / What unexpected events have occurred in the AI field in the past six months?

/ 02 / The three evolutions of AI applications: Where does the Agent paradigm come from?

/ 03 / How do Tool Use and reinforcement learning empower Agents?