OpenAI dropped two bombshells late at night. GPT-5.1 Pro was urgently released, delivering a crushing blow to Gemini 3.
The AI circle updates so frequently that it's really hard to keep up...
Two days ago, Grok 4.1 and Gemini 3 Pro were first released. Today, OpenAI's GPT - 5.1 Pro has quietly made its debut!
There was no blog post, only a two - sentence official announcement.
As is well known, GPT - 5.1 features strong "emotional quotient" and "intelligence quotient", and the Pro version undoubtedly takes these two advantages to a higher level.
On the same day, OpenAI's brand - new ace code model, GPT - 5.1 - Codex - Max, was officially launched on the Codex platform!
It's not hard to tell from the name that it is based on GPT - 5.1 and specially trained for intelligent agent tasks in software, engineering, mathematics, research, etc.
As a result, GPT - 5.1 - Codex - Max has stronger capabilities, faster response, and uses fewer tokens.
The new model is designed for "long - term, high - intensity" development tasks.
Put it this way, it can work autonomously for over 24 hours continuously, process millions of tokens at once, and directly deliver results.
This just confirms that the Scaling Law still holds.
This is because GPT - 5.1 - Codex - Max is OpenAI's first model with a "native support for compression" mechanism, which can work across multiple contexts.
Now, it can handle tasks such as project refactoring, in - depth debugging, and multi - hour intelligent agent loops with ease.
Currently, GPT - 5.1 Pro has been launched for all Pro subscribers.
GPT - 5.1 - Codex - Max already supports CLI, IDE extensions, cloud, and code review in Codex, and the API interface will be available soon.
As 2025 is coming to an end, the ultimate AI showdown is imminent. Between GPT - 5.1 Pro and Gemini 3 Pro, who will emerge victorious?
OpenAI's Most Powerful Programming Model
This GPT - 5.1 - Codex - Max was refined on the "real battlefield"!
It has received specialized training in common engineer tasks such as PR creation, code review, front - end development, and Q&A.
In multiple cutting - edge coding evaluations, it easily outperformed all of OpenAI's previous models.
Also, in the evaluation results on SWE - bench Verified, GPT - 5.1 - Codex - Max scored a high mark of 77.9%.
GPT - 5.1 - Codex - Max not only has high scores but also a significant upgrade in actual experience!
It is OpenAI's first model that can run in a Windows environment. It has also been optimized for the Codex CLI collaboration scenario during training, making it more user - friendly.
Thought Tokens Reduced by 30%
Moreover, using GPT - 5.1 - Codex - Max is more cost - effective.
Under the same "medium" inference intensity, it not only performs better than GPT - 5.1 - Codex but also uses about 30% fewer tokens in the thinking process.
For tasks that are not sensitive to latency, the newly added "Extra High" (xhigh) inference intensity can spend more time to obtain high - quality answers.
However, for daily use, OpenAI still recommends the medium mode.
By saving tokens, it means that in actual development, the cost can be significantly reduced, which is a blessing for developers.
In the following demos, the difference in token usage between GPT - 5.1 - Codex - Max and GPT - 5.1 - Codex is clearly presented. Even with fewer tokens, the former's functionality and appearance in front - end design are no less than before.
For example, let them generate a browser application - an interactive CartPole reinforcement learning sandbox, which needs to include a small policy gradient controller, an indicator panel, and an SVG network visualizer.
Top: GPT - 5.1 - Codex - Max; Bottom: GPT - 5.1 - Codex
GPT - 5.1 - Codex - Max completed the task using only 27k thought tokens, and the code is more concise.
This demo requires creating a solar system gravitational well sandbox, which needs to visualize the movement of objects in a 2D gravitational potential field and support dragging to pan the view and orbiting to observe the scene.
Top: GPT - 5.1 - Codex - Max; Bottom: GPT - 5.1 - Codex
GPT - 5.1 - Codex - Max also used fewer tokens and more refined code to complete the task.
GPT - 5.1 - Codex - Max is so powerful because it adopts a brand - new mechanism.
Running Continuously for a Day, All Thanks to "Compression"
The "compression" mechanism enables GPT - 5.1 - Codex - Max to break through limitations and handle tasks that could not be completed due to overly long contexts.
For example, complex refactoring and long - term intelligent agent loops.
It automatically organizes historical content, filters and retains the most critical context, thus achieving coherence over a long time span.
In Codex, when approaching the context limit, GPT - 5.1 - Codex - Max will automatically perform session compression, refresh the context, and repeat this process multiple times until the task is completed.
In the following case, GPT - 5.1 - Codex - Max is autonomously refactoring the open - source repository of Codex CLI.
It can be seen that when the context is almost full, it will automatically compress to free up space, thus completing the task without losing progress.
The video has been edited and accelerated for a clearer presentation of the process
Internal tests show that GPT - 5.1 - Codex - Max can work autonomously for over 24 hours continuously.
During this period, it can continuously iterate the implementation, fix test failures, and finally deliver usable results.
This long - term, coherent task ability is the general cornerstone for a more general and reliable AI system.
In the METR evaluation, GPT - 5.1 - Codex - Max's long - range task ability has become the new state - of - the - art (SOTA).
Within OpenAI, 95% of engineers use Codex every week. Since its introduction, the number of Pull Requests from the team has increased by about 70%.
Now, with the continuously upgraded CLI, IDE extensions, cloud integration, and code review tools, the programming efficiency with GPT - 5.1 - Codex - Max has skyrocketed.
Some netizens were instantly amazed by their first - hand experience.
GPT - 5.1 Pro Launched, First Tests Here
As for GPT - 5.1 Pro, as mentioned at the beginning, OpenAI only wrote two paragraphs of introduction in the version update log.
Although the official didn't publish a separate blog post, those who got the internal testing qualification in advance were very excited to share their experience immediately.
Regarding the performance of GPT -