HomeArticle

Just now, GPT-5.5 was released. It is more powerful, faster, and more expensive. An NVIDIA engineer in the internal test said, "Losing it feels like having an amputation."

量子位2026-04-24 08:14
The model optimizes its own inference infrastructure, increasing the speed by 20%.

GPT-5.5 has just arrived.

Officially positioned as "a new type of intelligence for practical work and agents."

This time, Altman didn't step forward himself to say, "I was so shocked during the first experience that I sat there stunned. It was like seeing an atomic bomb explode." Instead, he invited a group of mouthpieces (early testers).

Among them, there is an NVIDIA engineer. After the early testing, he temporarily lost access to GPT-5.5 and then said this:

Losing GPT-5.5 is like getting an amputation.

Let's stop joking around.

The cooperation between OpenAI and NVIDIA this time is unprecedented.

First, GPT-5.5 is jointly designed with NVIDIA's GB200 and GB300 NVL72 systems. From training to deployment, there has been a two-way interaction between the model and the hardware since their inception.

Second, Altman promoted Codex across the entire NVIDIA company and even showed the emails he exchanged with Jensen Huang.

Let's first look at the data on the results of this cooperation.

Compared with the previous version, GPT 5.4, the new model outperforms in three fields: code, knowledge work, and scientific research.

There are two ways to interpret the results of the comprehensive test, the Artificial Analysis Intelligence Index:

GPT-5.5 consumes fewer tokens than Claude Opus 4.7 and other models to achieve the same score.

Or, when consuming the same number of tokens, GPT-5.5 can complete more tasks.

But what's most surprising is not the benchmark scores.

In the past, every time a model was upgraded, "more powerful" and "slower" almost came as a package deal.

This is the price of the Scaling Law. Larger models with more parameters require more thinking time. Users pay for both intelligence and latency.

GPT-5.5 breaks this ironclad rule.

In a real production environment, its per-token latency is comparable to that of GPT-5.4, and it also consumes fewer tokens to complete the same tasks.

It is more efficient and more powerful.

(But the price has doubled.)

As of press time, the latest version of Codex can already use GPT-5.5.

The context window has also been upgraded to 400K.

 

Supercharging Programming

Programming is the field where GPT-5.5 has made the most significant improvements.

When using the previous generation of models, you still had to carefully break down tasks, watch it step by step, and be ready to correct it at any time.

GPT-5.5 is different. You just throw in your requirements, and it will break them down, execute, and check on its own. You only need to look at the results.

OpenAI demonstrated a 3D action game generated by GPT-5.5 under Codex, which can run directly on the web.

This includes implementing a combat system, enemy encounters, HUD feedback, and GPT-generated environmental textures using TypeScript/Three.js.

Terminal-Bench 2.0, a hardcore test for measuring complex command-line workflows. GPT-5.5 scored 82.7%.

The previous version, GPT-5.4, scored 75.1%, and the current strongest competitor, Claude Opus 4.7, scored 69.4%.

You can think of it this way: When faced with problems of this level, nearly one-third of the previous generation of models would get stuck, but now this proportion has been reduced to less than one-fourth.

Now, let's hear from the early testers:

Dan Shipper, an early tester, is the CEO of a startup and an active AI product developer. He conducted an experiment.

After his app went live, there was a bug. He hired a top engineer to refactor it. After some effort, the engineer came up with a solution.

Then Shipper went back in time: He fed the buggy code to the model to see if it could make the same decision as the engineer on its own.

GPT-5.4 couldn't do it, but GPT-5.5 could.

Shipper said that this was the first time he felt real "conceptual clarity" from a programming model.

It's not just about responding; it's about understanding the problem and figuring out how to solve it on its own.

More and more senior engineers are reporting the same thing: GPT-5.5 is significantly stronger than GPT-5.4 and Claude Opus 4.7 in terms of reasoning and autonomy.

It can detect problems in advance and predict testing and review requirements without explicit prompts.

Programming is just the beginning. The same leap in capabilities is spreading to the fields of knowledge work and scientific research.

Beyond Programming

What GPT-5.5 does in Codex goes far beyond writing programs. It can generate documents, organize tables, and create PPTs.

OpenAI has repeatedly emphasized that it understands what you want better than the previous generation.

More importantly, it can use tools on its own and check if the output is correct. If you give it a vague idea, it can fill in the rest.

Here's an interesting piece of data: More than 85% of OpenAI's employees use Codex for work every week. (What about the other 15%?)

Let's first look at the evaluation results.

In the knowledge work benchmark test GDPval, GPT-5.5 scored 84.9%, 4.6 percentage points higher than Claude Opus 4.7.

FrontierMath Tier 4, one of the most difficult current mathematical benchmarks, with questions from unpublished papers and open problems from top researchers.

GPT-5.5 Pro scored 39.6% in this test. Claude Opus 4.7 scored 22.9%, almost half the score of GPT-5.5 Pro.

What's really interesting is how scientists are using it.

Bartosz Naskręcki, an assistant professor of mathematics at Adam Mickiewicz University in Poland, wrote a sentence to Codex. Eleven minutes later, an algebraic geometry visualization application was up and running.

This application can draw the intersection line of two quadratic surfaces, mark it in red, and convert the intersection line into the standard form of a Weierstrass curve using the Riemann - Roch theorem. Later, he added a more stable singularity visualization function.

Just one sentence, and it took only 11 minutes. In the past, it would have taken half a day just to set up the project framework.

Derya Unutmaz, an immunology professor at the Jackson Laboratory for Genomic Medicine, used GPT-5.5 Pro to analyze a gene expression dataset: 62 samples with nearly 28,000 genes. In the end, he produced a complete research report.

He said that this would have taken the team several months.

OpenAI's positioning of GPT-5.5 in scientific research can be accurately summarized in one sentence: It is no longer like a one-time answer engine but more like a "research partner."

Early testers don't just use it to look up information. They use it for multiple rounds of paper revision, picking out flaws in the argument point by point, and proposing new analysis plans. It remembers the entire research context, and each round of conversation is built on the previous one.

GPT-5.5 has achieved a major breakthrough in the field of mathematics.

Ramsey numbers are one of the core problems in combinatorial mathematics.

Put simply, it studies how large a network needs to be to ensure the inevitable emergence of a certain order.

For example, among six people, there must be three who know each other or three who don't know each other. This is the simplest form of Ramsey's theorem.

It has been a tough nut to crack in the mathematical community for decades, and the asymptotic properties of off - diagonal Ramsey numbers have long remained unsolved.

GPT-5.5 has found a new proof path. It's not a replication of known methods but a discovery of a new way. Subsequently, this proof was confirmed to be correct by Lean, one of the most rigorous formal verification tools in the mathematical community.

An AI has made an original contribution verified by a formal tool in the core field of pure mathematics.

A year ago, this was unimaginable.

The Secret of Being Stronger and Faster