The Next Direction of Global Large Model Evolution: OpenAI Unveils GPT

The emergence of GPT-5 means that large models have finally entered a new competitive arena, where they can simultaneously orchestrate multiple Agents to work collaboratively and handle complex tasks.

(OpenAI CEO Sam Altman is releasing GPT-5. Image source: OpenAI official website live broadcast)

Each generation of the flagship models of OpenAI, a star AI (Artificial Intelligence) startup in the United States, leads the global technological trend for the next six months. On August 7th, Pacific Time, the company released GPT-5.

OpenAI CEO Sam Altman described that interacting with GPT-3 felt like talking to a high school student. Although it occasionally had moments of brilliance, there were also many annoying aspects. GPT-4o might be like talking to a college student, with real intelligence and practicality. Now, with GPT-5, it's like having a conversation with an expert - a professional doctoral-level expert who is always on standby in any field and can help you achieve any goal. GPT-5 can not only chat but also do things for you.

GPT-5 is a system composed of two models (Long-thinking version + High-efficiency version. The former can think deeply, and the latter can answer questions efficiently). It will automatically switch versions when users ask questions.

The performance benchmark test results disclosed on the OpenAI official website show that GPT-5 surpasses the previous generation's flagship model, OpenAI o3. The number of hallucinations in the long-thinking version of GPT-5 is six times less than that of o3. The international market research institution Artificial Analysis has long conducted performance benchmark tests on global mainstream models. As of the test results on August 8th, GPT-5 is currently the most powerful model globally.

While the performance has improved, the inference computing power cost of GPT-5 has also significantly decreased. The test results published on the OpenAI official website show that GPT-5 has better cost performance than OpenAI o3, with a 50%-80% reduction in the number of output tokens (a unit of measurement for AI inference computing power, where a token can be a word, punctuation, number, symbol, etc.).

Need to Consolidate the "Fragile Advantage"

OpenAI has always been the leader in the large model field. It is the AI startup with the highest valuation and revenue globally. As of August this year, OpenAI has raised another $8.3 billion in financing, with a cumulative financing of over $79.7 billion and a valuation of $300 billion.

As of August this year, ChatGPT has 180 million daily active users and 5 million paying enterprise users. As of April this year, the number of paying individual users of ChatGPT was 20 million.

Previously, media reported that as of the end of July this year, OpenAI was expected to achieve an annual recurring revenue (ARR) of $12 billion, a year-on-year increase of over 80%. Among them, consumer subscriptions (user subscription products such as ChatGPT Plus) generated $5.5 billion in revenue, business and partnerships (ChatGPT Team and Enterprise enterprise deployment versions) generated $3.6 billion in revenue, API (software call interface) calls generated $2.9 billion in revenue, and revenue from code-specific products was $400 million.

As the world's largest AI startup, OpenAI far exceeds its biggest competitor, Anthropic, the world's second-largest AI startup, in terms of financing, revenue, and valuation.

Anthropic has completed 14 rounds of financing since 2023, with a total amount of $18.2 billion. Currently, Anthropic's valuation is $61.5 billion. OpenAI's valuation is 4.9 times that of Anthropic. As of the end of July this year, Anthropic's annual recurring revenue is expected to be approximately $5 billion. That is to say, OpenAI's revenue scale is 2.4 times that of Anthropic.

Although holding an advantage, OpenAI faces more intense market competition. In the US market, Google's Gemini, Anthropic, and the AI startup xAI are all its direct competitors. The gap between the flagship models of these companies and OpenAI is within three months. In the Chinese market, two open-source models - Alibaba's Qwen series and DeepSeek series of AI startup DeepSeek - have a gap of only 3-6 months compared to OpenAI's flagship model.

Since 2024, OpenAI's model iteration has significantly accelerated. However, in the past year, OpenAI has faced more criticism than ever before. Despite frequent model iterations, the performance improvement has not met public expectations. Several founding team members of OpenAI have left the company. The company's closed-source business model has also attracted complaints, and the industry jokes that OpenAI should be renamed "CloseAI".

A research report by JPMorgan Chase on July 18th pointed out that OpenAI's financing is mainly used for computing power and talent investment. It may cost approximately $46 billion in computing power costs and employee salaries in the next four years, and it is expected to turn a profit in 2029. JPMorgan Chase also believes that the rise of Google's Gemini 2.5 Pro and China's DeepSeek-R1 indicates that the large model market is highly competitive, and cost-effectiveness is becoming increasingly important.

In June this year, Wu Di, the person in charge of the intelligent algorithm of ByteDance's Volcengine and the person in charge of VolcArk, explained the above logic to Caijing. In his view, whether in the Chinese market or the US market, the capabilities of basic models will continue to improve in the next 12 months, with three directions for improvement.

Firstly, multi-modal (text, image, video, audio, and other complex format materials) inference models will take the lead, which is a current trend. AI will be able to integrate various information such as text, images, audio, and video for comprehensive reasoning. It will greatly enhance the Agent's ability to understand complex information in the real world.

Secondly, video generation models will become mature and available, and a surge is expected by the end of this year. This means that Agents can not only understand the world but also generate content and simulate processes in a more dynamic and intuitive way.

Thirdly, the ability to handle complex multi-step tasks will be significantly improved, and a major breakthrough is expected by the end of this year. This is a crucial step for Agents to mature. When the model can stably and reliably plan and execute complex tasks involving dozens or even hundreds of steps, the problem of Agents "abandoning tasks" will be fundamentally solved.

In Wu Di's view, most current Muti-Agent applications are "like toys." However, based on the breakthroughs in these three technological lines, he made a final judgment - the accuracy of Muti-Agent applications will be significantly improved by the end of 2025. After AI applications with visual understanding and reasoning capabilities become popular by the end of 2025, the computing power consumption of a basic task may exceed 100,000 tokens. At that time, the token consumption will increase rapidly.

Seize the Opportunity on the Eve of the AI Application Explosion

The explosion of AI applications, especially the explosion of Agents (lightweight AI applications), is the most obvious trend in the implementation of large models in 2025.

The international IT consulting firm Gartner predicts that by 2028, 33% of enterprise software will include Agents, compared to less than 1% in 2024; by 2028, 15% of daily work will be autonomously completed by Agents, compared to nearly 0% in 2024.

However, in the first half of 2025, Agents were considered immature (for details, see "Why Can't We Understand AI Agents?"). The reason is that the capabilities of the basic models are not strong enough.

Two important technological breakthroughs of GPT-5 this time are both aimed at solving this problem - one is the model's multi-modal (text, image, video, audio, and other complex format materials) capabilities, and the other is the ability to follow instructions and use Agent tools.

Greg Brockman, the co-founder of OpenAI, gave an example of a code scenario at the GPT-5 press conference and said, GPT-5 has set a new standard. It is the best model for intelligent agent code tasks. You can ask it to complete very complex things. It will start working, call many tools, and work continuously for several minutes, sometimes even longer, to achieve your goals and follow your instructions, no matter what you want to create.

That is to say, with the maturity of GPT-5's multi-modal understanding ability and Agent tool usage ability, it means that the large model can command multiple Agents to work together. It will have the ability to control Muti-Agents (multi-intelligent agents) and handle complex tasks.

GPT-5 has opened up a new competition point for basic large models. The leap in its basic capabilities means that more complex AI applications will be unlocked. Every time a new batch of AI applications is born, the consumption of AI computing power will also increase exponentially. The "flywheel" of models, applications, and computing power will accelerate.

In June this year, Wu Di, the person in charge of the intelligent algorithm of ByteDance's Volcengine and the person in charge of VolcArk, explained the above logic to Caijing. In his view, whether in the Chinese market or the US market, the capabilities of basic models will continue to improve in the next 12 months, with three directions for improvement.

Firstly, multi-modal (text, image, video, audio, and other complex format materials) inference models will take the lead, which is a current trend. AI will be able to integrate various information such as text, images, audio, and video for comprehensive reasoning. It will greatly enhance the Agent's ability to understand complex information in the real world.

Secondly, video generation models will become mature and available, and a surge is expected by the end of this year. This means that Agents can not only understand the world but also generate content and simulate processes in a more dynamic and intuitive way.

Thirdly, the ability to handle complex multi-step tasks will be significantly improved, and a major breakthrough is expected by the end of this year. This is a crucial step for Agents to mature. When the model can stably and reliably plan and execute complex tasks involving dozens or even hundreds of steps, the problem of Agents "abandoning tasks" will be fundamentally solved.

In Wu Di's view, most current Muti-Agent applications are "like toys." However, based on the breakthroughs in these three technological lines, he made a final judgment - the accuracy of Muti-Agent applications will be significantly improved by the end of 2025. After AI applications with visual understanding and reasoning capabilities become popular by the end of 2025, the computing power consumption of a basic task may exceed 100,000 tokens. At that time, the token consumption will increase rapidly.

A New Round of Model Competition Unfolds

The foundation for the rotation of the "flywheel" of models, applications, and computing power is the continuous improvement of model capabilities. In 2025, the large model competition among global technology companies has become more intense, and the pace of large model iteration has accelerated.

In the field of large models, knowledge is updated on a monthly or even weekly basis. A single paper or a single model can potentially disrupt the existing technological routes. A senior algorithm engineer once told Caijing that in the field of large models, a large number of academic papers are published every week; there are almost new technological breakthroughs every month; and the leading models are often overtaken every three or four months.

According to incomplete statistics by Caijing, within 220 days from January 1st to August 8th, 2025, 11 technology companies in China and the United States (including Alibaba, ByteDance, Tencent, Baidu, Huawei, DeepSeek, DarkSide, Google, OpenAI, Anthropic, and xAI) participating in the model competition released or iterated at least 32 versions of large models. On average, a new version of a large model is released every 6.9 days.

The update cycle of basic models is even getting shorter. The update cycle from OpenAI's GPT-4.5 to GPT-5 is 161 days; from OpenAI's o1 to o3, the update cycle is 132 days; from xAI's Grok 3 to Grok 4, the update cycle is 142 days; for two versions of DeepSeek-R1, the update cycle is 128 days; for two versions of DeepSeek-V3, the update cycle is 87 days; and for two versions of Google Gemini 2.5, the update cycle is only 42 days.

The release of GPT-5 will force Chinese and American technology companies to launch a new round of large model competition - training stronger models and purchasing larger-scale computing power. This path will not change in the short term.

The development of current large models has several key cornerstones. One is data, the second is algorithms, and the third is computing power. It relies on "brute force to create miracles," that is, using huge resource investment to exchange for performance improvement.

In June this year, Chen Yiran, a professor in the Department of Electrical and Computer Engineering at Duke University, told Caijing that the basic route of AI evolution is still brute force to create miracles. People have been discussing when this model will reach its limit and when its potential will be exhausted. The academic community is also trying to find new paths. However, there is currently no other effective way, so the industry has no other choice but to continue along the path of "brute force to create miracles."

Currently, Chinese technology companies, such as Alibaba's Qwen 3, which was updated in July this year, has temporarily caught up with OpenAI's o3 released in April this year. The release of GPT-5 means that a new round of catching up is about to begin.

Caijing learned that one of the core goals of Tongyi Laboratory, Alibaba's large model R & D department, this year is to maintain leadership in model performance, download volume, and the number of derivative models.

Zhou Jingren, the CTO of Alibaba Cloud and the person in charge of Tongyi Laboratory, told Caijing during a group interview at the ModelScope Developer Conference in June this year that model performance must be competitive enough to prove its strength in authoritative and recognized benchmark tests.

He also mentioned that Tongyi Laboratory always regards tracking and analyzing global cutting - edge technology trends as part of its daily work. They not only pay attention to papers at

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

The next direction of the global large model evolution: OpenAI has developed GPT-5.

Need to Consolidate the "Fragile Advantage"

Seize the Opportunity on the Eve of the AI Application Explosion

A New Round of Model Competition Unfolds