Stop GPT-5-Codex: The Outside Is Full of AI Programming Agents

Stop it, GPT-5-Codex.

OpenAI has officially launched GPT-5-Codex, specifically designed for agentic programming, significantly improving performance in code refactoring, review, and defect discovery. Its dynamic resource allocation mechanism makes the model more efficient in low-load requests and more in - depth in complex tasks. As the battle for programming agents heats up in 2025, can GPT-5-Codex break through?

Get a comprehensive view of global large models at a glance! As a grand offering for the 10th anniversary of New Intelligence Yuan, the 37 - page 2025 ASI Frontier Trends Report is released for the first time.

Just now, the "brand - new" version of GPT-5 has been launched!

This time, OpenAI directly uses the Codex brand name as the suffix for the new model, GPT-5-Codex!

The new model will be even more proficient in intelligent coding!

Even though only one - third of 2025 remains, the competition among tech giants for "programming agents" is still white - hot!

OpenAI's official blog has redefined "autocompletion" and upgraded it to "agent - complete".

In the sixth episode of the OpenAI podcast, President Greg and Codex director Thibault Sottiaux talked a lot about GPT-5-Codex and what software development might look like by 2030.

First, let's quickly go through this major update.

The newly released GPT-5-Codex is a special version of GPT-5, which is redesigned for agentic programming (agentic coding).

GPT-5-Codex will have comprehensive "dual - mode" expertise:

Immediate collaboration: Cooperate with developers in real - time, quickly answer questions, and fix minor bugs.

Independent execution: Autonomously advance complex tasks (such as large - scale refactoring and cross - file debugging) for an extended period.

In simple terms, GPT-5-Codex is not only fast but also more reliable.

GPT-5-Codex has a more sensitive interactive response. Small tasks are almost instantaneous, and large tasks can be continuously executed for hours.

OpenAI's internal tests show that it can complete large - scale refactoring continuously for 7 hours.

Three major performance improvements of GPT-5-Codex

First, in the SWE - bench verification and code refactoring tasks, GPT-5-Codex outperforms the current state - of - the - art GPT-5-high.

Especially in the code refactoring task, which is very suitable for real - world tasks, the accuracy of GPT-5-Codex reaches 51.3%, far higher than the 33.9% of GPT-5-high.

Second, the key feature of this update of GPT-5-Codex is "dynamic adjustment" of resources!

According to the usage data of OpenAI's internal employees, among the bottom 10% of user requests sorted by the number of tokens generated by the model (including hidden inferences and final output), the token consumption of GPT - 5-Codex is 93.7% less than that of GPT - 5, as shown by the red arrow.

On the contrary, in the top 10% of high - complexity requests, GPT - 5-Codex will invest more thinking time, and the time for code inference, editing, testing, and iteration is doubled, as shown by the yellow arrow.

Finally, this GPT - 5-Codex is specially trained and is very good at performing code reviews and finding critical defects.

According to OpenAI, they found that the comments generated by GPT - 5-Codex are less likely to be incorrect or unimportant, allowing users to focus more on key issues, such as:

"Incorrect comments" are significantly reduced: from 13.7% to 4.4%.

"High - impact comments" are significantly increased: from 39.4% to 52.4%

"Focus on key points": The average number of comments per PR is reduced from 1.32 to 0.93

This makes Vibe Coding closer to serious engineering programming!

Why name it Codex?

At the "launch event" of GPT-5-Codex, Greg talked about the origin of Codex.

As early as the GPT - 3 era, they found that the model could automatically complete function code based on docstrings. They believed that the feasibility of "writing code with language models" was already there at that time.

In 2021, OpenAI was the first to launch Codex and collaborated with GitHub to create Copilot, exploring the possibility of directly embedding AI into the development workflow.

The current web interface of Codex

Greg said that programming has always been an area that OpenAI has paid special attention to. It uses code data and metrics to optimize model performance, which is different from other areas.

The Harness concept earlier than Vibe Coding

In this discussion, Greg also used a new term "Harness" to explain that OpenAI actually discovered the magic of "programming with language models" earlier than the popular Vibe Coding.

"Harness" originally means a harness or reins, which is used to connect a horse to a cart or a rider, allowing the power to be controlled and exerted.

Greg at OpenAI borrowed this term when talking about Codex to express a similar function:

The model itself is like a "horse" or a "brain", capable of generating power (intelligence, input, and output).

Harness is like a "reins/integration framework", connecting the model to the external environment (tools, IDEs, terminals, clouds, etc.), enabling it to truly execute tasks and exert its effectiveness.

When using ordinary language model applications, the interface or "harness" is actually very simple - the model only completes a piece of text, and at most, it follows up with one or two conversations and then ends.

But in programming scenarios, the text comes "alive" because the code needs to be actually executed and connected to tools to function.

Therefore, people have realized that the importance of the harness is almost as crucial as the intelligence of the model itself, as it determines whether the model is truly usable.

OpenAI's so - called harness is to integrate the model with the rest of the infrastructure, allowing the model to truly take actions in the environment.

Performance and user experience

The low latency of GPT-5-Codex is a major highlight. Code completion must be within 1.5 seconds; otherwise, the user experience will be poor.

GPT-5-Codex can continuously execute long - term tasks, making it particularly suitable for large - scale refactoring and migration tasks.

After this update, it also supports multi - mode interaction: terminal vibe coding, IDE editing, GitHub integration, Cursor integration, etc., meeting different development habits.

OpenAI's internal practices

In addition to GPT-5-Codex, Greg also revealed more inside information.

OpenAI has incubated several key tools in its internal practices to help the team explore the potential of AI programming agents.

First is 10x, an internal prototype that initially ran in the terminal and can significantly improve development efficiency.

It supports asynchronous long - term execution. Engineers can even close their laptops and let the tasks continue to run. Therefore, it is considered to bring "ten - fold productivity", but it has not been released to the public because it is not yet fully polished.

Second is Agents.md, which is a documentation file in the code repository, similar to a README specifically written for Codex.

It can compress the context, reduce the burden on the model to explore the code, and store the team's development preferences (such as test locations and style conventions). In this way, Codex can understand the project environment faster and execute tasks more efficiently.

Finally, Code Review Agent is the most amazing tool after the internal pilot.

It can understand whether the intention and implementation of a PR are consistent, check dependencies, and find bugs that human reviews may miss.

The internal team even relied on it to review dozens of PRs the night before the launch and released with almost zero bugs.

The discussion also mentioned that software development in 2030 will no longer be "humans write code + tool assistance", but "AI writes most of the code + humans supervise and design the architecture".

Developers are more like team commanders, focusing on strategic issues and creative design, while the cumbersome, repetitive, and dangerous work is undertaken by AI agents.

Stop it, GPT-5-Codex

Now, programming agents have become the focus of major AI giants, and the competition is fierce!

OpenAI's release of GPT-5-Codex this time is another "official announcement to join the battle".

However, stop it! The market is already full of programming agents!

Let's take stock of how many programming agents there are at home and abroad this year~

Foreign general/mainstream programming AI agents

Cursor: Deeply integrated in the IDE, has an agent mode, can retrieve local code, and perform cross - file operations and project - level refactoring.

Claude Code CLI: Capable of code diff, tool invocation, and rapid prototype experimentation.

Gemini CLI: Has an advantage in the context window and strong ability to handle large - scale codebase refactoring.

GitHub Copilot + Copilot extensions

Representative domestic products/platforms

The domestic market is also accelerating rapidly in this field. Many large companies are working on the combination of "programming agents/programming models", and there are already many models and products specifically for programming.

Tencent's code assistant CodeBuddy

Tongyi Qianwen's Qwen3 - Coder

ByteDance's TRAE

Baidu's Wenxin Agent Platform

DeepSeek's latest V3.1 series

For example, DeepSeek's official announcement points out that V3.1 shows significant improvements in programming agents and solving complex tasks in the command - line/terminal environment compared to previous DeepSeek series models.

Overall, although 2025 is called the Year of Agents, it mainly focuses on programming agents.

The foreign market, represented by Cursor, Gemini CLI, Claude Code, etc., emphasizes the model's execution ability, large - context refactoring handling, and seamless integration with IDEs/CLIs.

Domestically, similar products have also been launched to compete with Cursor and Claude Code.

The launch of GPT-5-Codex has made the "battle of programming agents" even more intense!

Although OpenAI recognized the potential of "writing code with language models" early on, the following situations have occurred:

The recognition of AI programming IDE has been captured by Cursor.

The recognition of AI programming CLI has been taken by...

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Stop it, GPT-5-Codex. The outside is full of AI programming agents.