ChatGPT Agent: It's still within Manus' capabilities, but there's a glimmer of hope for end-to-end implementation.
Agent is the biggest consensus in the AI circle this year, and OpenAI naturally can't fall behind.
At 1 a.m. Beijing time on July 18, 2025, Sam Altman and four OpenAI researchers officially released ChatGPT Agent, a general-purpose AI Agent, during a live broadcast.
With the precedents of Manus, Lovart, and Flowith, the functional scenarios presented by ChatGPT Agent aren't particularly astonishing. However, the significance of its release transcends its functions themselves.
The revolutionary aspect of ChatGPT Agent lies in its unique technical approach: it can actively select proxy skills from the toolbox and use its own computer to complete tasks. Users can observe the AI's working process in the virtual environment in real - time.
Although this interaction interface is similar to products like Manus, the underlying principles are fundamentally different. Manus calls multiple underlying models, similar to "external stitching," while ChatGPT Agent internalizes the Agent capabilities within the model. We've already seen the prototype of an end - to - end general Agent.
Manus's design essentially achieves "external stitching" by calling multiple underlying models. In contrast, ChatGPT Agent internalizes the Agent capabilities within the model itself.
According to OpenAI, to develop ChatGPT Agent, they merged the Operator and Deep Research teams into a unified team consisting of 20 to 35 people.
According to the system card of ChatGPT Agent, it is a new proxy model belonging to the same series as OpenAI o3 and adopts an end - to - end training method. It is a unified model developed for proxy tasks, rather than an engineered combination of multiple models.
According to the comparison PPT released by OpenAI, we can see that this training is basically completed through a reinforcement learning process, which should be similar to the path of Grok4 with tool.
After retraining, the Agent combines the multi - step research and high - quality report generation capabilities of Deep Research, the ability of Operator to execute tasks through a remote visual browser environment, terminal tools with limited network access rights, and the ability to access external data sources and applications through connectors.
After completing complex tasks, it can also deliver a downloadable PPT or a document to the user.
For Manus, this new move by OpenAI is undoubtedly a huge blow. Even in terms of pricing, the difference between the two is not significant: the Plus package of GPT allows users to use ChatGPT Agent for $20 per month, while the basic plan of Manus costs $19 per month.
Key points:
ChatGPT Agent: It is a unified AI Agent capable of performing complex, multi - tool tasks.
It integrates access to text browsers, GUI browsers, terminals, and image generation tools.
It supports interactive, multi - round conversations with users, allowing interruption and clarification.
Upgraded security protection: Strengthened defense against "malicious prompt" attacks on web pages; automatic rejection of high - risk tasks is set; biological/chemical risks are also handled according to the highest - level security stack.
It has achieved state - of - the - art results in multiple real - world and benchmark tasks.
Overview of ChatGPT Agent: Functions are similar to Manus
The core of ChatGPT Agent is a unified agentic system that integrates and expands the capabilities of OpenAI's early research projects, "Operator" (focusing on website interaction) and "Deep Research" (focusing on information synthesis).
This enables ChatGPT Agent to seamlessly switch from reasoning and thinking to executing specific actions within a single conversation flow.
Virtual computer environment: ChatGPT Agent executes all tasks on a dedicated virtual computer. This environment is sandboxed to ensure operational security. It can save the task context in this environment. Even if the user interrupts or changes the instructions midway, it can continue from the breakpoint without losing progress.
Intelligent toolbox: To complete complex workflows, the Agent is equipped with four tools and can automatically select the most suitable tool according to task requirements:
Visual Browser: Used to interact with graphical user interfaces, such as clicking buttons, filling out forms, and browsing websites designed for humans.
Text - based Browser: Used for web queries that require efficient reasoning and processing of large amounts of text.
Terminal: Allows the Agent to run code, download, and process files.
API Access: Can directly call APIs to obtain information, such as accessing data from applications like Google Drive, Gmail, and GitHub through connectors.
Driven by a new model: ChatGPT Agent is driven by a new model specifically developed for it. This model has been specially trained on complex tasks that require the use of multiple tools through reinforcement learning, thus learning how to switch smoothly between different tools and work collaboratively.
It has the following features:
Autonomous task execution: Users can issue instructions in natural language, such as "Analyze my calendar and brief me on the upcoming client meeting based on recent news." The Agent can autonomously plan and execute a series of operations, such as browsing websites, filtering information, running code for analysis, and finally generating editable slides or spreadsheets as results.
Collaboration and interactivity: It will actively ask for more details when needed to achieve the goal. Users can interrupt, redirect tasks, or completely take over the browser's control at any time.
Security and permission control: Security is a core part of its design. Before performing key operations with real - world impacts, such as making purchases, submitting forms, sending emails, or processing personal information, the Agent will explicitly request user permission. At the same time, it is prohibited from performing high - risk tasks such as financial transfers or providing legal advice. OpenAI has also built - in protective measures against malicious attacks such as "prompt injection."
Set "records" in multiple benchmark tests
The most difficult HLE reaches 41.6% (with tool), higher than the just - released Grok4 (with tool) at 41.0%.
In the Humanity’s Last Exam, which measures broad - area knowledge and expert - level questioning, the single - answer accuracy rate reaches 41.6%. After using parallel eight - way reasoning and selecting the answer with the highest confidence, it can be increased to 44.4%.
In the extremely difficult FrontierMath math benchmark, the accuracy rate increases to 27.4% after running code through the terminal.
In the internal evaluation of real - world knowledge work tasks, the ChatGPT Agent has matched or outperformed humans in about half of the cases;
In the real - world data science task DSBench, its analysis and modeling accuracy rates reach 89.9% and 85.5% respectively, far exceeding the average human level.
Its direct editing ability for spreadsheets is also leading: it scored 45.5% in SpreadsheetBench, exceeding Copilot in Excel's 20%. In addition, it has refreshed the SOTA in browsing evaluations such as BrowseComp and WebArena.
(Figure: Evaluation method: The author of SpreadsheetBench evaluated spreadsheets using Microsoft Excel in a Windows environment. We used LibreOffice in an OSX environment, which may cause slight differences in scores. For example, the author reported that GPT - 4o achieved a result of 15.02% on the overall Hard limit, while we got 13.38%. We used the full 912 - question benchmark test.)
According to the PPT made by ChatGPT Agent itself, in terms of PPT - making ability and Internet - surfing ability, the Agent's capabilities have significantly improved compared to pure basic models. However, there is still a considerable gap compared to humans.
It's not a future product; it's available today
As of today, Pro users can start using it immediately. Plus and Team users will gain access within a few days; the Enterprise and Education versions will be connected in a few weeks.
Pro users can use 400 messages per month, while other paying users have a monthly quota of 40 messages, which can be increased through flexible pay - as - you - go billing.
Actual use is very simple: Switch to the "agent mode" in any conversation and describe the goal, such as in - depth research, making a presentation, or reimbursement. The operation process is displayed in real - time on the left side of the screen; if login is required, the system will switch to the "take - over mode" to safely enter credentials.
Users can also set completed tasks to be executed periodically, such as automatically generating indicator reports every Monday.
Altman personally warns of risks: Agent is powerful but also dangerous
Notably, after the press conference, Altman immediately posted a long message warning of the risks of using ChatGPT Agent.
After "emphasizing" the powerful ability of ChatGPT Agent to handle complex tasks, he particularly and solemnly warned of the product's risks and emphasized: We're not yet clear about the specific impacts, but criminals may try to "trick" users' AI agents into providing private information they shouldn't provide and taking actions they shouldn't take, and the ways they do this are unpredictable.
The model may come into contact with users' sensitive data or be attacked by malicious "prompt injection" on web pages. For this reason, they have continued the strict controls from the Operator period and added multiple protections:
Explicit user authorization must be obtained before key actions;
The "supervision mode" is enabled for some high - risk tasks (such as sending emails), requiring users to monitor the entire process;
It will actively reject high - risk instructions such as bank transfers;
Users can clear browsing data and log out of all sessions with one click, or disable connectors when they don't need to access the Internet.
In terms of biological and chemical safety, OpenAI