GPT Image 2 Dominates Rankings and Leads Google in Comeback, 5 Months After Altman's "Red Alert"

GPT Image 2 reached the top of the Arena, leading by 241 points and setting a new record.

[Introduction] After being suppressed by Google for half a year, OpenAI finally launched a counter - attack. Just 12 hours after the launch of GPT Image 2, it topped the Arena text - to - image ranking, leading Nano Banana 2 by 241 points. The Arena official said that this is the largest point difference in the Image Arena text - to - image ranking to date.

On the day of its release, it dominated all three rankings.

Just 12 hours after the launch of GPT Image 2, it topped all three sub - rankings: Text - to - Image, Single - Image Edit, and Multi - Image Edit.

The exact words of the Arena official: "a clean sweep".

In the main text - to - image ranking, GPT Image 2 scored 1512 points, while Nano Banana 2 scored 1271 points. The 241 - point gap is the largest in Arena history.

"No model has ever dominated the Image Arena with such a gap," the Arena official said.

In all blind - test match - ups in Image Arena, GPT Image 2 had a win rate of 93%: in 100 paired blind selections of images, people chose the OpenAI image 93 times.

"If DALL - E is regarded as cave paintings and Images 1.0 as ancient art, then Images 2.0 is the Renaissance."

OpenAI introduced Images 2.0 like this at the beginning of the press conference, and Altman even called it a generational upgrade:

This is like suddenly jumping from GPT - 3 to GPT - 5.

https://www.youtube.com/watch?v=sWkGomJ3TLI

The official OpenAI API documentation gave the highest - level evaluation of Images 2.0.

https://developers.openai.com/api/docs/models/gpt-image-2

But the real story is not in the data.

After being suppressed by Google for half a year, OpenAI finally turned the tables

Let's rewind the time to August 2025.

Google released Nano Banana. This image - generation model embedded in Gemini instantly exploded in the consumer market.

Three months later, at the Q3 earnings conference, Google CEO Sundar Pichai personally disclosed a set of figures: the monthly active users of Gemini increased from 450 million in July to 650 million in October.

Josh Woodward, the head of Google Labs, said that this growth was largely driven by the image - generation boom led by Nano Banana.

In November, Google released Nano Banana Pro. Its text - rendering ability was amazing. For the first time, AI images could write words correctly, and OpenAI was overtaken in the consumer market.

On November 18th, Google struck another blow. As soon as Gemini 3 was released, it topped the LM Arena with 1501 points, becoming the first cutting - edge model to break through 1500 points.

At the end of this month, Altman sent an internal memo of "code red" to the entire company.

According to The Information, Altman privately told employees that Gemini 3 might bring an economic headwind to OpenAI. Yahoo Finance subsequently disclosed that under the "code red", OpenAI suspended the R & D of other products such as AI Agent and shifted all resources to ChatGPT.

In December, OpenAI hastily launched GPT Image 1.5. It ranked first in the Arena, but failed to explode in the consumer market.

In February 2026, Google struck another blow. Nano Banana 2 made its debut and led the Arena again.

OpenAI lost again.

It wasn't until April 21st, when GPT Image 2 was launched, that OpenAI overtook its competitors and turned the tables.

The drawing AI will be redefined

Why can GPT Image 2 lead by 241 points?

The core answer lies in the architecture level.

GPT Image 2 is not a diffusion model of the same generation as Stable Diffusion.

Boyuan Chen, the head of OpenAI's research, said that this is a "generalist model" that is "revamped from scratch". OpenAI's internal name for it is the "image version of GPT".

But Chen refused to publicly admit whether it is a diffusion or autoregressive architecture during the press briefing.

The outside world generally understands it as an "image - generation system with reasoning and planning": it plans before drawing.

OpenAI gave it a new label in the official description: the first image model with native thinking capabilities.

It thinks before drawing, checks itself after drawing, searches for information online when needed, and can produce 8 coherent images at a time.

This is not a paintbrush, but a thinking visual assistant.

The sub - item data of the Arena ranking shows:

In the Text Rendering sub - item, GPT Image 2 scored 316 points higher than the previous generation; it scored 296 points higher in cartoon and portrait respectively; in the three categories of product/3D/realistic, the overall score increase was in the range of +247 to +277 points.

Text rendering was first solved by Nano Banana Pro in November 2025, but the accuracy rate was 94% at that time. GPT Image 2 has pushed it to 99%.

At the OpenAI press conference, a demonstration was made: asking GPT Image 2 to draw a bowl of rice, with the model name written on only one grain of rice.

Specifically for the ability demonstration, OpenAI President Greg Brockman made a demonstration on his X account.

The first case is old - photo restoration.

A faded and yellowed old family photo can be instantly transformed into a high - definition color version with just one prompt.

The "high - fidelity image inputs" in the official OpenAI API documentation refers to the model's ability to retain the details of the original image: it can accurately read the details of faded, damaged, and blurred old photos at the input end, and then re - render a clear version at the output end.

In the second case, Brockman reposted a set of test images from user @doodlestein: using the same complex prompt to ask GPT Image 2 to draw a mathematical explanation diagram.

He commented that even with complex prompts, GPT Image 2 can generate images with different styles.

@doodlestein tested GPT Image 2 by using the same prompt to draw a linear - algebra explanation diagram. The model drew 4 completely different versions at once: even for the same teaching of Mona Lisa + eigenvectors, the composition, color scheme, and information density of each version were completely different.

The real value of this case is not that it can draw mathematical diagrams, but that it solves an important pain point in AI image - generation in the past two years: single output and poor controllability of variations.

For the first time, GPT Image 2 has made "getting 4 completely different directions from one prompt" a product - level ability.

A senior tester of LM Arena commented:

The gap between GPT Image 2 and Nano Banana Pro is as large as the gap between Nano Banana Pro and DALL - E.

It has crossed an entire generation.

A manga - style comic page generated by GPT Image 2 in Thinking mode: starting from a simple prompt, the model maintains character consistency and lays out a multi - panel plot.

DALL - E is retired, and Adobe Canva is cornered

On the day of its release, the integration speed of downstream tools was even faster than the technology circle expected.

Figma, Canva, Adobe Firefly, fal, and Hermes Agent all completed integration on April 21st.

The API pricing also hides a hidden agenda:

High - quality image output costs $0.21 per image; ChatGPT Plus costs $20 per month, and image generation is already included in the package.

Behind this price difference, there may be the largest industrial restructuring in the image - generation industry in 2026.

A photorealistic candid generated by GPT Image 2. Coast, cloudy day, vintage car, film texture - this kind of visual effect that used to require professional photographers for on - location shooting and post - production can now be achieved with an API cost of $0.21 per image. OpenAI researcher Gabriel Goh said that photorealism is the ability of this model that he is most excited about.

On May 12th, DALL - E 2 and DALL - E 3 were officially retired.

They were the pioneers that initiated the entire AIGC visual revolution in 2022. Three years later, they were sent into history by their successor from OpenAI.

OpenAI mentioned in the official release description:

Images are not decorations, but a language. A good image does the same thing as a good sentence: select, arrange, and reveal.

This represents a shift in product philosophy.

Of course, there are also opposing voices. ZDNet found in actual tests that GPT Image 2 could not accurately reproduce brand logos, and even ZDNet's own logo was drawn crooked.

Nano Banana 2 still has an advantage in portrait realism and multi - reference consistency.

Although GPT Image 2 is not perfect yet, the landscape of the race has changed.

The rendering era is over, and the reasoning era has just begun

Google integrates reasoning into the image model. OpenAI integrates image tools into the reasoning model. The 242 - point Elo gap measures the difference in their architectures.

This evaluation from implicator.ai divides the image - generation into two eras.

From 2022 to 2025, it was the rendering era.

The goals of DALL - E, Midjourney, and Stable Diffusion were all to "draw realistically". The model was the paintbrush, the user was the painter, and the prompt was the sketch.

GPT Image 2 represents a reasoning era.

The model thinks before drawing, can search, self - check, and complete tasks. It is not a paintbrush, but a drawing assistant.

What really deserves attention about the release of GPT Image 2 is the fact that image - generation is moving towards "thinking".

In the short term, Black Forest Labs (Flux 2) may be in the most trouble.

Kingy AI said bluntly that as a diffusion - first manufacturer, the entire technical pipeline of Flux 2 conflicts with the "token - by - token" reasoning route in terms of architecture.

It either has to integrate or rewrite, there is no third way.

In the medium term, Google may counter - attack in the next quarter. Nano Banana 3 or Imagen - Reason will come out soon.

In the long term, the impact of this event goes far beyond image - generation.

When AI starts to use "thinking" to produce images, videos, audios, and codes, the entire paradigm of generative AI will change.

When Altman wrote "code red" in the memo last December, he probably didn't expect to return to the top of the Arena in this way five months later.

But the real significance of this counter - attack may not be that OpenAI defeated Google, but that OpenAI rewrote the rules of the image - generation race.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Five months after Altman issued a "Red Alert," GPT Image 2 dominates the rankings and takes a significant lead over Google in a comeback.

After being suppressed by Google for half a year, OpenAI finally turned the tables

The drawing AI will be redefined

DALL - E is retired, and Adobe Canva is cornered

The rendering era is over, and the reasoning era has just begun