Just now, Ultraman released ChatGPT's "Unified Intelligent Agent"! People exclaimed that it's truly AGI, and here comes the most competitive worker!
[New Intelligence Yuan Introduction] The ChatGPT agent is born! Altman led a live - stream late at night. The first unified intelligent agent seamlessly integrates three major AIs, thinks and makes decisions autonomously, and can also directly create PPTs and Excel spreadsheets online. In 2025, as a brand - new AI lever, ChatGPT is unlocking a new "super - individual" model.
Tonight, ChatGPT, Deep Research, and Operator, the "Three Musketeers", are joining forces for the first time!
Altman personally led the team. In a 25 - minute high - energy live - stream, ChatGPT agent was officially launched, opening a new era of collaboration between humans and intelligent agents.
The core of ChatGPT agent is a unified intelligent agent system.
In short, it combines the advantages of the previous three technological breakthroughs: Operator's ability to interact with websites, Deep Research's skills in integrating information, and ChatGPT's advantage in intelligent dialogue.
Now, ChatGPT can directly use a computer and work for you autonomously throughout the process.
It can intelligently browse web pages, filter results, remind you to log in securely when needed, run code, conduct analysis, and directly create PPTs and Excel spreadsheets to summarize the discovered results.
Most importantly, everything is under control.
Humans can interrupt tasks, take over the browser, or stop the process completely at any time.
In the HLE test, ChatGPT agent scored a high 41.6%; and on the mathematical FrontierMath benchmark, it also refreshed the SOTA, crushing the o4 - mini and o3 models.
By the way, ChatGPT Agent still lags behind Musk's Grok 4 Heavy on HLE.
Who would have thought that the above PPT was made by ChatGPT agent itself. In the benchmark test, its ability to operate office software has left little room for humans.
Netizens commented sharply: The good days of office workers are over.
Altman sighed that seeing ChatGPT agent use a computer to perform complex tasks was a real "AGI - feeling" moment for him.
From today on, Pro, Plus, and Team users can directly start the experience. Just select "Agent mode" in the drop - down menu of the dialog box.
Among them, Pro users have a monthly quota of 400 times, while Plus and Team users have 40 times per month.
TL;DR: (Excerpted from X of Xikun Zhang, a researcher at OpenAI)
Deep Research is good at doing research, Operator can perform operations, and ChatGPT agent can complete all these tasks simultaneously!
The power of end - to - end reinforcement learning! Based on RL Scaling, the efficiency and data utilization rate of ChatGPT agent are extremely impressive.
Human - machine collaboration remains the core! You can interrupt the task at any time during the process and guide ChatGPT to complete new tasks. It will actively confirm with humans before operations such as payment and file deletion. It only asks questions to obtain clearer instructions when necessary.
Real - world performance > Chasing benchmark rankings! ChatGPT agent has indeed swept many lists. However, during the model development process, OpenAI neither focuses solely on getting high scores nor cares too much about the final position in the rankings.
The first combination of the three powerhouses
ChatGPT agent makes its official debut
In January this year, OpenAI released the first intelligent agent, Operator, which enables AI to directly interact with the GUI like a human.
Then, in early February, they launched the first Deep Research, allowing the reasoning model to directly use tools for research.
These two tools each have their own strengths. Operator can autonomously surf the Internet, click, and input, while Deep Research is good at analyzing and summarizing information.
However, the former cannot conduct in - depth analysis or write detailed reports; the latter cannot interact with websites to obtain precise results.
Today, OpenAI has officially combined them into one - ChatGPT agent, and a "single model" can unlock new capabilities.
ChatGPT agent is equipped with a complete set of tools:
· Visual browser: Used for interacting with web pages through the graphical user interface
· Text browser: Used for handling simple reasoning and web queries
· Terminal + Direct API access: Image API
The agent can also connect to applications such as Gmail and Github through the ChatGPT connector, making it easy to find relevant information and give responses according to prompts.
Moreover, after taking over the browser, it can log in to any website, enabling ChatGPT agent to conduct more in - depth and extensive research and task execution.
Thus, ChatGPT can choose the best path to execute tasks efficiently.
ChatGPT takes over the PPT work of office workers
To demonstrate the capabilities of ChatGPT agent, the team demonstrated a real - world scenario on - site: planning a wedding for friends Minnie and Sarah.
According to the prompt, this task requires AI to recommend beautiful and reasonably - priced dresses based on the dress code and weather conditions, book hotels for the participants, and prepare wedding gifts for the newlyweds.
After understanding the prompt, ChatGPT agent did not directly produce a report but restated and confirmed the task requirements, such as the exact wedding date.
After everything was confirmed, it then autonomously opened the browser and displayed each step of the execution process, that is, the chain of thought, on the interactive page.
It should be noted that the agent starts to execute the task in a virtual computing environment configured within a few seconds.
During task execution, the agent used the text browser to search and found suitable suits, then switched to the visual browser and waited for confirmation.
While ChatGPT is executing the wedding planning task, you can also assign it another task: buying a pair of size 9.5 black shoes.
This means that ChatGPT agent is not afraid of being interrupted. Even if the previous task has a long planning time, it will not affect the subsequent tasks.
Finally, ChatGPT agent generated a very comprehensive report, including plans and suggestions for dresses, hotels, shoes, and gifts.
In another demonstration, the team used the ChatGPT application to start a task - uploading a picture of the team's mascot, a cute puppy, to make notebook stickers and ordering 500 of them.
Then, it began to call the tool Imagen to generate an anime - style picture, design the stickers, and order 500 copies from StickerMule to be sent to xxx.
Even more surprisingly, ChatGPT agent can also extract evaluation data through connectors, such as Google Drive, and generate PPTs by itself.
During this process, the agent will write code and compile it into the final slides. It will also borrow image tools to decorate the PPT pages.
After a short while, it directly output the first PPT on HLE and FrontierMath, but it was not very refined. Then it continuously optimized it through RL.
Finally, a beautiful PPT file was obtained, which can be directly opened in office software.