HomeArticle

Just now, OpenAI's general intelligent agent, ChatGPT Agent, officially made its debut.

机器之心2025-07-18 11:43
The biggest upgrade to date.

ChatGPT can now think about actions, actively select tools, and use its own virtual computer to complete tasks for you.

The era of Agent AI has arrived earlier than we thought.

Early Friday morning Beijing time, OpenAI suddenly launched a live broadcast of a new product.

What was released this time is the brand - new ChatGPT Agent, which has achieved a key upgrade in the capabilities of a general intelligent agent.

Different from previous upgrades of basic large - models, a general agent can automatically utilize multiple tools for planning and help people complete complex tasks, including automatically browsing the user's calendar, generating editable PPTs, running code, etc. The agent can connect to your Gmail and GitHub websites to obtain information and solve problems, and use APIs to access various applications. The AI intelligence empowered by the agent has been significantly improved. The model based on ChatGPT Agent scored 41.6% on the HLE benchmark, almost twice that of o3 and o4 - mini.

ChatGPT Agent is currently available to subscribers of OpenAI's Pro, Plus, and Team plans. Users who want to use it can select "Agent Mode" from the tool dropdown menu in ChatGPT.

OpenAI said that enterprise and education users are expected to get the new features later this summer. At the official release, Pro users will typically be able to use Agent prompts up to 400 times per month, while other paying users can use them up to 40 times. It is still unclear when this feature will be available to free ChatGPT users.

This is OpenAI's boldest new product launch to date. From now on, ChatGPT has become an agent product that can take actions and share tasks for people, far beyond the scope of just answering questions.

OpenAI CEO Sam Altman said that watching the ChatGPT agent use a computer to perform complex tasks was a real "AGI - feeling" moment for me. Seeing the computer think, plan, and execute brings a different kind of experience.

ChatGPT can now use its own virtual computer to complete work for you and handle complex tasks from start to finish. Users can not only ask ChatGPT to perform requests such as "query the annual financial report", and it will intelligently browse websites, filter results, prompt you to log in securely when needed, run code, conduct analyses, and even deliver editable slides and spreadsheets, summarizing its research results.

For example, ask "ChatGPT Agent to search for the annual comprehensive financial report of San Francisco (2020 - 2024)":

Another example is to input the prompt "I'm a tennis fan and want to go to Palm Springs to watch tennis matches, especially during the semi - finals/finals. I live in San Francisco. Please help me create a detailed three - day itinerary, including flight arrangements, hotel reservations, activity content (matches, hiking, food, spa, etc.). I like hiking, vegan restaurants, and spas. The total budget is $3000. This itinerary needs to include: precise time arrangements; details, costs, and other information for each activity; provide ticket - buying or reservation links if necessary", and then let ChatGPT Agent help you create a detailed itinerary:

The core of this new ability is a unified intelligent agentic system. It combines the advantages of three early breakthroughs, including Operator's website interaction ability, deep research's information synthesis ability, and ChatGPT's intelligent reasoning and dialogue ability.

With its own virtual computing environment, ChatGPT can flexibly switch between reasoning and execution, and handle complex workflows from start to finish according to the user's instructions. Most importantly, the user is always in control. ChatGPT will ask for your permission before performing any important operations, and you can also interrupt the task, take over the browser, or stop the operation at any time.

OpenAI said, "Although ChatGPT Agent can already handle complex tasks, this launch is just the beginning. We will continue to iterate and regularly introduce major improvements to make it more powerful and practical, serving more users."

The Natural Evolution of Operator and Deep Research

In the past, Operator and deep research each had unique advantages: Operator could scroll, click, and input on web pages, while deep research was good at analyzing and summarizing information.

However, they work best in different scenarios and each has areas they are not good at. Operator cannot conduct in - depth analysis or write detailed reports, while deep research cannot interact with web pages, further filter results, or access content that requires user login.

OpenAI found that many tasks that users tried to handle with Operator were actually more suitable for deep research, so they decided to integrate the advantages of the two.

By integrating these complementary abilities into ChatGPT and introducing more tools, OpenAI has unlocked new capabilities in one model. It can now actively interact with websites - click, filter, and collect more accurate and efficient results. Users can also seamlessly transition from natural communication to issuing specific operation requests in the same conversation.

OpenAI has equipped ChatGPT Agent with a complete set of tools: including a visual browser that interacts with web pages through a graphical user interface, a text browser for handling simple reasoning - type web queries, a terminal (command - line interface), and the ability to directly call APIs.

The agent can also use ChatGPT Connectors to connect applications such as Gmail and GitHub, enabling ChatGPT to find information relevant to your prompt and use it in the answer. Users can also take over the browser and log in to accounts on any website, helping it to conduct more in - depth and extensive information retrieval and task execution.

Providing ChatGPT with multiple ways to access and interact with web information means that ChatGPT Agent can choose the optimal path to complete tasks most efficiently. For example, it can obtain the user's calendar information through the API, use the text browser to efficiently process large amounts of text content, and also has the ability to interact with websites designed for humans through the visual interface.

All these operations are completed on ChatGPT Agent's own virtual computer. This can retain the context information required for the task when using multiple tools. ChatGPT Agent can choose to open web pages with the text browser or visual browser as needed, download files from the Internet, run commands in the terminal to process files, and then view the output results through the visual browser. It will also adjust strategies according to the task to execute quickly, accurately, and efficiently.

ChatGPT Agent is designed for iterative and collaborative workflows and is far more interactive and flexible than previous models. During the task execution of ChatGPT, the user can interrupt it at any time, further clarify the instructions, make it develop in the expected direction, or completely change the task content. It will continue to work based on the new information without losing the previous progress.

Similarly, ChatGPT will also actively ask the user for more details when needed to ensure that the task always aligns with the goal. If a task takes longer than expected or gets stuck, the user can choose to pause the task, request a progress summary, or directly terminate the task and obtain the partial results already available. If the user installs the ChatGPT app on their phone, it will also send a notification when the task is completed.

Benchmark Test Results: Expanding Real - World Practicality

The improved capabilities of ChatGPT Agent and the underlying model are reflected in the top - notch performance in multiple benchmark tests, including web browsing and the ability to complete real - world tasks.

In the "Humanity's Last Exam" evaluation (this evaluation measures the performance of AI on expert - level questions in various fields), the model supporting ChatGPT Agent scored 41.6 in the Pass@1 score of this evaluation.

Since the agent can dynamically plan and independently select tools, it can handle the same task in different ways. When expanding through a simple parallel strategy - running up to eight attempts simultaneously and selecting the result with the highest self - reported confidence - the agent's HLE score increased to 44.4.

FrontierMath is currently the most difficult known mathematical benchmark test, containing new and unpublished questions that usually take mathematicians hours or even days to solve. With the ability to use tools (such as accessing the terminal to execute code), ChatGPT Agent achieved an accuracy rate of 27.4% in this test, far surpassing all previous models.

OpenAI also evaluated the model using a benchmark test that simulates complex real - world tasks. In an internal benchmark for evaluating the model's ability to handle modeling tasks for first - to third - year investment banking analysts, such as creating a three - statement financial model with proper formatting and references for a Fortune 500 company, the model behind ChatGPT Agent significantly outperformed deep research and o3.

OpenAI also evaluated ChatGPT Agent in the BrowseComp benchmark test. This benchmark was released by OpenAI earlier this year to measure the ability of browsing agents to find hard - to - obtain information on the Internet. ChatGPT Agent set a new SOTA (state - of - the - art performance) in this test, scoring 68.9%, 17.4 percentage points higher than deep research.

Finally, in the WebArena benchmark test, which evaluates the ability of web - browsing agents to complete real - world web tasks, ChatGPT Agent outperformed the CUA (the model that powers Operator) driven by o3.

For more details of the benchmark tests, please refer to the ChatGPT agent system card:

System card address: https://cdn.openai.com/pdf/839e66fc-602c-48bf-81d3-b21eacc3459d/chatgpt_agent_system_card.pdf

Finally, Sam Altman posted a long tweet introducing the security limitations of ChatGPT Agent.

Agent represents a new height in the capabilities of AI systems. It can use its own computer to complete some special and complex tasks for you. It combines the essence of Deep Research and Operator, but its actual functions far exceed expectations - it can think for a long time, use some tools, think more deeply, take some actions, and then think more deeply, and so on.

For example, at the launch event, we showed a demonstration of preparing for a friend's wedding