Just now, OpenAI released its own Agent mode, Manus Style.
There is a consensus that the major theme of AI this year is Agent.
If in the past we were used to AI "talking", then the next era will be when AI "takes action". After all, since AI can understand and is smart enough, why can't it just get the job done?
At the beginning of the second half of 2025, OpenAI, which has always been defining AI, suddenly presented their Agent answer sheet. Interestingly, at first glance, it is astonishingly similar to the Manus mode that became extremely popular a few months ago.
Early in the morning on July 18th, Sam Altman and four OpenAI researchers introduced the upcoming Agent mode of OpenAI in a live broadcast.
To put it simply, in the Agent mode, you can directly make requests to ChatGPT: "I'm short of a pair of shoes for my wedding. Go to the e - commerce platform and buy them for me." Or, "Design some pet accessories for me and place an order for printing directly." "Search for information and generate a PPT directly." Then, ChatGPT will open a virtual machine by itself and perform operations step by step.
In the demonstration, it took about 10 minutes to complete a complex task. But judging from the results, the completion rate was very high. ChatGPT can call the text browser, visual browser, and terminal in the virtual environment. Based on the terminal, it can further call cloud service APIs, image generators, and run code.
More importantly, this time OpenAI is no longer giving priority to Pro users. Plus and Team users will also be able to get started quickly and use it 40 times a month. It offers a large quantity to meet the needs.
With his signature sincere look, Sam Altman said to the screen: "This is a brand - new paradigm. Just as we learned to surf the Internet and finally learned to distinguish fraud information, now, the whole society needs to learn how to interact and coexist safely with Agent."
01 What can the Agent mode do?
By directly watching the demonstration of OpenAI's Agent mode, you will find that its intuitive experience is highly similar to the Manus mode that became extremely popular a few months ago.
After the user puts forward a demand, a virtual machine will be automatically started to execute some tasks automatically. During the execution process, the Agent will repeatedly request the user for confirmation and allow the user to take over manually at any time. At the same time, the user can also insert new demands in the middle of the task for real - time interaction.
In OpenAI's introduction, the Agent mode can call three tools: the text browser, the visual browser, and the terminal. The model can independently choose to switch between various tools.
The design of this tool combination is quite ingenious: the text browser is responsible for browsing a large amount of text and searching for information, while the visual browser is responsible for directly simulating some mouse and keyboard interactions after locating the information or for reading image information.
The terminal can run code, generate files including PPTs and Excel spreadsheets, and call some cloud APIs.
In the first demonstration provided by OpenAI, the researcher proposed to plan for attending another friend's wedding, including selecting a set of dresses that meet the dress code (considering the venue, weather, and mid - to high - end price range), booking a hotel, and providing gift suggestions.
The researcher first switched to the Agent mode in ChatGPT and sent the above - mentioned demands. The Agent started the virtual computer and loaded the environment (which took about a few seconds).
Then ChatGPT first used the text browser to open the webpage provided by the user to search for wedding information, dress code, weather, etc. When it found that it needed to further confirm the wedding date, the model also made a clarification request, but the user chose to let it continue reasoning on its own.
After finding the weather and venue information, the AI started to recommend suitable dresses and switched to the visual browser to check the effects of the dresses. After completing this task, it continued to search for hotels and gifts.
It can be seen that the final wedding travel advice report was very long and detailed, covering clothing, hotels, and gifts. It even attached a large number of links. For the index of whether the hotel had vacancies, it also attached a screenshot of the online booking website.
It only took the AI 10 minutes to complete such a report. Compared with the one - question - one - answer mode we are familiar with, it seems to take much longer. But compared with the actual workload, the AI still seems to be much more efficient than humans.
If the first demonstration mainly shows its research ability, another demonstration directly shows its practical ability.
The researcher asked to make a batch of notebook stickers for the team's mascot (a cute dog named Bernie) and place an order for 500 stickers.
The Agent directly used the terminal function to call the image generation tool (Image Gen API) to generate an anime - style illustration of the dog as the design pattern for the stickers.
Then, the Agent opened the browser to access the Sticker Mule website, uploaded the designed image to the website, filled in the quantity and size of the stickers, and added the products to the shopping cart.
Finally, it actively asked the user for confirmation: "Should we use this illustration?" "Should we continue to place the order?" "Do you want to enter your credit card information yourself, or let me continue to complete it?"
The task paused at the step of waiting for the user to enter the credit card information and took 7 minutes.
With the same ability, the Agent also connected to the Google Drive API (similar to domestic cloud storage) by itself, read the files, and generated a PPT.
It queried the season schedule and generated a detailed travel spreadsheet + a travel guide with a marked map. This task was relatively complex and took the Agent about 25 minutes to complete.
02 Understated: The AI's ability has improved again
The new Agent mode launched by OpenAI this time is not actually a brand - new innovation but a combination of two tools launched by OpenAI in the first half of the year: Operator and Deep Research.
Operator is a browser - based Agent tool originally only available to Pro users. It can analyze graphical user interfaces and perform certain operations.
Deep Research is an in - depth research and analysis tool that can read a large number of web pages and directly generate a research report.
OpenAI said that during the launch of the two tools, they found that many prompts written by users of Operator were actually more like the tasks of Deep Research, such as "Plan a trip and make a reservation." And Deep Research users strongly called for the ability to "log in to websites and access protected resources," which Operator could do long ago. So the team decided to integrate the two products.
This is actually quite similar to the culture of OpenAI's team as reported by a recently departed OpenAI engineer: OpenAI highly values the self - drive of engineers. There are often multiple similar projects being promoted simultaneously, and anyone who wants to can move forward.
The integration of Operator and Deep Research this time seems to be quite successful. Two Agent projects promoted from different angles finally merged and had some wonderful chemical reactions. It also avoided the inefficiency of using only the graphical interface of the browser to read text materials, making the time required to form a in - depth report not too long.
OpenAI also mentioned how to train the model after providing it with multiple tools.
It still uses reinforcement learning. At the beginning, the model will "awkwardly" try to use all tools to solve a relatively simple problem. That is to say, it won't be able to judge which tool is more suitable at first.
By rewarding its more efficient and reasonable behaviors in solving problems, the model can gradually learn how to use these tools and in which situations each tool is most suitable.
For example, when creating a creative work, it will first search for public resources; then use the terminal to write code and compile the work; finally, use the visual browser to verify the results.
Among a bunch of demos, OpenAI also casually presented a new benchmark test result.
In the Humanities Last Exam, the Agent - mode model that can use the browser, computer, and terminal has achieved a high score of 42%, which is twice the score of o3, which does not use any tools at all.
It is also leading globally. Grok announced that the tool - enabled Grok 4 Heavy achieved a score of 45% in the test.
The advanced mathematical reasoning ability after using tools has also been further improved.
Two of the announced benchmarks are for comparison with humans.
One is the ability to operate on web pages (WebArena), and the other is the ability to operate spreadsheets (SpreadsheetBench). It can be seen that in both benchmarks, the Agent mode is still inferior to humans, but in web page operations, it has come close to the human level.
This means that even by integrating these tools that are not as good as humans, the large - scale model can still achieve significant improvement in ability. In the Agent era, there is obviously a higher ceiling for the improvement of the large - scale model's ability.
03 The era of coexisting with Agent has really arrived
There is no doubt that Agent is the absolute hot topic in the AI field in 2025.
But under the hot topic, users' real experience is often not perfect: tasks take too long to run, and errors occur frequently in slightly complex tasks. An early user of Operator commented, "Every click and scroll is like swimming on a hot summer day."
OpenAI's integration of Operator and Deep Research this time may be to relieve this "sticky feeling" and make the Agent really work.
When OpenAI gets involved directly, a more direct question is presented to all third - party developers similar to Manus: Will this give rise to a prosperous Agent application ecosystem, or will it directly crush all startups? The answer is still unclear.
For users, a more immediate challenge follows: privacy and security.
When AI clicks on a web page and enters our personal information in a virtual machine that we can't see, who will guarantee the security?
If it is deceived by a phishing website and loses our credit card number, who will be responsible?
OpenAI's response to this is that they will take extremely strict review and security measures, but they also hope that the whole society can spend time to adapt and establish norms.
The Agent era is indeed a completely different new stage after the Chat era.
In the Chat era, we learned to adapt to AI's "mouth" - we gradually got used to its hallucinations and learned to distinguish the true from the false in its sweet talk. This is a challenge about "information credibility."
In the Agent era, the challenge completely shifts to AI's "hands." We need to answer a series of new questions: How much do we really trust AI? How much authority are we willing to hand over and let it complete how many real - world things for us?
Our relationship with AI will also be redefined because of this.
From a more macro perspective, the explosion of Agent will also push an old problem in a more acute way in front of us: When AI can really "do the work," what will happen to our jobs?
When AI can independently complete a complex report including data retrieval and image verification and directly complete online reservations, will the jobs of white - collar workers be empowered and accelerated, or will they be completely threatened? The answer is still in the air.
But whether we welcome, fear, or are confused, a more automated new era driven by Agent is really accelerating towards us.
This article is from the WeChat official account "GeekPark" (ID: geekpark), author: Li Yuan. It is published by 36Kr with authorization.