HomeArticle

Will OpenAI kill the Manuses?

山上2025-07-21 08:07
Listen - that's the sound of countless startups quietly vanishing into thin air.

Just as OpenAI attempted to prematurely end the AI Agent startup competition with the text - to - image update in March, it's making another such move.

Early in the morning of July 18th, Beijing time, OpenAI released ChatGPT Agent. It can automatically plan execution steps according to user instructions, call multiple tools, and complete multi - step tasks ranging from data scraping to table generation, itinerary planning to hotel booking.

Screenshot of OpenAI's tweet

This is also the direction that most AI Agent startup projects are currently exploring. Four months ago, whatever you saw demonstrated in the so - called first general AI Agent promo video of Manus, ChatGPT Agent has now achieved it.

Sam Altman, the founder of OpenAI, said, "This is the first time I've truly felt the presence of AGI (Artificial General Intelligence)." OpenAI researchers claim that ChatGPT Agent is the most powerful AI Agent model to date.

— Yes, OpenAI refers to ChatGPT Agent as a model, not a product. Different from systems like Manus that rely on context management and tool - chain orchestration, OpenAI has trained a dedicated model capable of performing complex processes such as task planning, cross - tool invocation, and document generation within a single system. This model is currently part of the o3 series but hasn't been named separately.

Entrepreneurs in the AI era are facing more rapid technological iterations than in any historical period. An update to the underlying model can often doom an innovative product in a vertical field.

Li Xiang, the founder of Li Auto, previously said on his WeChat Moments that in the to - C aspect, companies like OpenAI that possess the most powerful base models won't leave much room for startups in vertical applications. "The essence of software is functionality, which requires scenario - based and vertical implementation. The essence of artificial intelligence is capability. A powerful AI can dominate everything, and it's also the most convenient for users."

Even Zhu Xiaohu, who has been advocating for AI application innovation, said on social media that large models will swallow up 90% of Agents. Some users on the X platform also questioned how other entrepreneurs could compete if OpenAI later opens up the API of the ChatGPT Agent model.

“Listen - that's the sound of a great many startups evaporating into the void.”

A highly - upvoted comment under OpenAI's press conference video reads.

Manus and Others Choose to Confront Head - on

At least for now, Manus and similar companies show no sign of backing down.

Right after OpenAI's press conference, Manus retweeted on X, saying, "Welcome to the game." Flowith, another Chinese - founded AI Agent startup, also retweeted to emphasize that they launched an AI Agent product a year ago.

As the startup that was the first to publicly promote the concept of a general AI Agent in the past six months, Manus reacted much more strongly than other companies. Just three hours after the press conference, Manus released ten comparison tests with ChatGPT Agent, declaring their intention to compete directly with OpenAI.

These comparison tests partly draw on the demonstration clips shown by OpenAI on that day and partly from users' real - world usage on social platforms. The scenarios covered include data organization, route planning, online shopping, financial analysis, restaurant reservation, etc. The test results released by Manus show almost comprehensive superiority - not only faster response times but also a greater emphasis on "task completion." For example, the tables are neater, the illustrations are more abundant, and the PPTs are closer to finished products.

Comparison video of Manus and ChatGPT Agent

For instance, in the task of "planning a three - day tennis trip to Palm Springs," OpenAI presented a simple itinerary, while Manus generated an itinerary poster with a destination - themed design.

Comparison tests released by Manus

Another example is the analysis of San Francisco's financial reports over the past four years. OpenAI output an Excel file, while Manus provided a complete presentation document with charts and key - point summaries. "Manus completes the entire project, not just provides data," Manus commented.

Another Chinese - founded company, Genspark, also reacted boldly. Eric Jing, the founder, wrote on X, "I never thought that one day, as a small company with only 24 employees, we could lead... lead over OpenAI." He said that with the same prompts, Genspark has shorter response times, lower costs, and the quality of the generated results is "several times higher."

On July 19th, Genspark also shared nine comparison examples with ChatGPT Agent on social platforms, showing that the documents they output have richer data dimensions and more aesthetically - pleasing layouts. In addition to cases similar to those in Manus' comparison tests, such as travel itinerary planning and financial data analysis, they also shared a comparison of video - generation capabilities, pointing out that ChatGPT Agent failed to complete the task.

Video - generation case shared by Genspark

The feedback from users on social media isn't as intense as when OpenAI updated its text - to - image function. Some critical voices point out that ChatGPT Agent has a low task - completion rate and slow task - generation speed. Some complex tasks take 20 minutes or even longer to complete.

OpenAI seems to be aware of the speed issue of the current ChatGPT Agent. In several of their promotional videos, employees often close their laptops after giving instructions and come back later to check the results.

“Even if it takes 15 minutes or half an hour, it's still a significant speed - up compared to doing it manually yourself,” said Isa Fulford, a researcher at OpenAI. She said this is a way of "initiating a task in the background and coming back later to check the result," and OpenAI's search team is more focused on low - latency scenarios.

OpenAI may place more emphasis on the duration that the model can continuously reason and think. Zhang Xikun, a researcher at OpenAI, said that in internal tests, ChatGPT Agent's longest continuous reasoning time reached 2 hours. "We should have a leaderboard to record how long a model can continuously think."

In response to the criticism that the generated documents or PPTs aren't aesthetically pleasing, OpenAI researchers suggested on X that users first let ChatGPT Agent finish the research work and then have it output a PPT file. ChatGPT generates files in the standard pptx format, and users can also apply the desired design templates uniformly in PowerPoint.

Although OpenAI emphasizes that they trained a dedicated model for ChatGPT Agent, some critical voices accuse it of being more like a combination of the previously launched Operator (browser interaction ability) and Deep Research (in - depth research ability). Operator enables ChatGPT to directly interact with websites through the browser, read and understand web - page content, while Deep Research is good at analyzing and summarizing information.

In fact, the current team members of ChatGPT Agent are from the former Operator and Deep Research departments, and the current team size is about 20 - 35 people. OpenAI stated externally that ChatGPT Agent is a natural continuation of the functions of Operator and Deep Research. "We found that many queries users made through Operator were actually more suitable for Deep Research, so we combined the advantages of the two."

OpenAI said that this release only marks the first step in integrating the agent function directly into ChatGPT, and they plan to gradually update more functions regularly.

Two Technical Routes

Compared with startups that have been continuously iterating on engineering and optimizing prompts in terms of output quality and delivery experience over the past six months, the final presentation of tasks by the newly - released ChatGPT Agent from OpenAI can be considered rough.

Startups are trying to present users with an Agent product with a higher completion rate and lower difficulty of use. Take Manus as an example. In the past two months, the company has added various capabilities to its product, including PPT generation, video generation, and audio generation. The official website also lists many ready - made templates and user cases. Even though the implementation of these capabilities relies on external models, startups generally do a better job than OpenAI in terms of ease of use.

Templates shared on Manus' official website

However, when it comes to the competition in the capabilities of the underlying model, ChatGPT Agent, which is trained in an end - to - end manner, clearly has an advantage. OpenAI has conducted many academic tests on ChatGPT Agent, and some of the test results even outperform OpenAI o3 or GPT 4o, reaching the highest level in the industry.

For example, in the "Humanity’s Last Exam" assessment, ChatGPT Agent achieved a new high of 41.6% (pass@1), approximately twice that of OpenAI o3. In the DSBench test, ChatGPT Agent significantly outperformed GPT - 4o and was clearly better than human performance in data - analysis tasks.

Test results of Humanity’s Last Exam

On the SpreadsheetBench platform, which specifically measures spreadsheet - editing ability, ChatGPT Agent set a new industry record, with performance twice as good as that of GPT - 4o. OpenAI said that in their internal benchmark tests, ChatGPT Agent's capabilities are roughly equivalent to those of an investment - bank analyst with 1 - 3 years of experience.

In short, OpenAI emphasizes the improvement of the underlying model capabilities brought by ChatGPT Agent, while startups, limited by technology and funds, tend to focus on application innovation.

Early in the morning of July 19th, Ji Yichao, the co - founder of Manus, posted an article saying that Manus will continue to bet on context engineering (in - context learning) rather than end - to - end agents.

He said that at the beginning of the Mannus project, they were considering whether to train an end - to - end agent using an open - source model or build an agent based on the in - context learning ability of cutting - edge models. The emergence of models like GPT - 3 made them realize that context engineering was the right direction because the capabilities of these models were far higher than their previous internal models.

“If the progress of models is the rising tide, we hope Manus will be the boat, not the pillar fixed on the seabed,” Ji Yichao said. This allows them to deliver improvements in hours rather than weeks and always keep their free product orthogonal to the underlying models.

In this technical document, he shared a lot of Manus' experience in context engineering, such as designing around KV caches and using system files as context. These engineering innovations have significantly improved Manus' response speed and cost - effectiveness.

Ji Yichao gave an example that using KV caches can greatly improve the generation time of the first token and reduce the inference cost. For example, when using Claude Sonnet, the cost of cached input tokens is 10 times lower than that of uncached tokens.

Technical document shared by Ji Yichao

Innovation in context engineering can indeed enable agents to have better performance. The non - profit artificial - intelligence research institution Epoch AI tested ChatGPT Agent's performance in the FrontierMath math question set and said that ChatGPT Agent only achieved a 27% correct - answer rate for Tier 1 - 3 math questions, and the score decreased as the difficulty increased.

However, when ChatGPT Agent was allowed to attempt each question 16 times, its score increased significantly from 27% to 49%. Epoch AI said that better prompt design or task - structure support might significantly improve the performance of the current model.

Test results of Epoch AI

In other words, even with the same model, startups can still achieve far - better results than the baseline model through better prompt engineering and context design.

“How you shape the context ultimately determines how your agent behaves: its running speed, recovery effect, and scope of expansion,” Ji Yichao said.

How to Coexist with the Future of Agents

The official launch of ChatGPT Agent marks that the era of AI Agent has officially entered the stage of competition among tech giants. Its impact on human society won't be less significant than that at the beginning of the large - model boom, making it a real possibility for AI to take over human jobs.

This change is already happening quietly. Tech giants like Microsoft and Amazon are laying off employees intensively. Satya Nadella, the CEO of Microsoft, said at the beginning of this year that 20% - 30% of Microsoft's code is now generated by AI. Klarna, a fintech company, announced as early as the beginning of last year that after just one month of use, their AI Agent had handled two - thirds of the company's customer - service chat work, equivalent to the workload of 700 full - time human customer - service representatives.

The market research firm MarketsandMarkets said that the global AI Agent market will grow from $5.1 billion in 2024 to $47.1 billion in 2030, with a compound annual growth rate (CAGR) of 44.8%. Deloitte predicts that by 2025, 25% of companies using generative AI will start piloting agents, and this figure will increase to