HomeArticle

ChatGPT Images 2.0 was released with a bang, outperforming Google's Nano Banana. Is design really over?

机器之心2026-04-22 08:47
Is the designer really going to be emotionally overwhelmed?

At 3 a.m. Beijing time, the live broadcast started on time, and OpenAI released ChatGPT Images 2.0.

According to the introduction, "ChatGPT Images 2.0 is the next step in evolution: a state-of-the-art model capable of handling complex visual tasks and generating precise, ready-to-use visual content."

It seems that for this reason, the official blog content released by OpenAI also provides two versions (image mode and classic mode), and the content in the image mode is completely generated by this model!

Blog address: https://openai.com/index/introducing-chatgpt-images-2-0/

In the blog, OpenAI said: "Images are a language, not just decoration. Good images, like good sentences, involve selection, organization, and presentation. They can explain mechanisms, create atmospheres, validate ideas, or build arguments."

The ChatGPT Images 2.0 model has achieved a qualitative leap in following instructions meticulously. It can accurately place and relate objects, render high-density text, and support generation in multiple aspect ratios. Its capabilities in composition and visual aesthetics make the output no longer seem "AI-generated" but more like "intentionally designed."

Moreover, it also performs accurately in a multilingual environment and can use extended visual and world knowledge to fill in details for you, thus obtaining more intelligent images with fewer prompts.

To handle the most complex tasks, Images 2.0 introduces "thinking ability" for the first time. When selecting the thinking or pro model in ChatGPT, Images 2.0 can access real-time information online, generate multiple different images from a single prompt, and review its own output. With "thinking," the model can take on more work between ideas and images, especially when accuracy, timeliness, consistency, and visual unity are crucial.

Combining the intelligence of OpenAI's inference model and a deep understanding of the visual world, this model elevates image generation from "rendering" to "strategic design" and evolves from a tool to a visual system, helping people transform ideas into understandable, shareable, teachable, and buildable results.

This ability has been open to all users of ChatGPT, Codex, and the API starting today.

Higher precision and control

Images 2.0 brings unprecedented specificity and fidelity to image creation. It can not only conceive more complex images but also effectively realize them. It can strictly follow instructions, retain key details, and render fine elements that previous models were prone to distort: small text, icons, UI elements, high-density compositions, and subtle style constraints. It supports a maximum resolution of 2K in the API. The results are no longer "almost right" but "ready to use."

Pay attention. Actually, the following screenshot was entirely generated by Images 2.0!

Stronger multilingual ability

Previously, image generation models performed more stably in English and languages using Latin alphabets, but had lower precision in other languages, especially when dealing with complex or dense text.

Images 2.0 breaks through this limitation and significantly enhances multilingual understanding, especially in text rendering for Japanese, Korean, Chinese, Hindi, and Bengali. It can not only correctly generate non-English text but also ensure natural and fluent language expression.

This not only means translating labels but makes the language itself a part of the design. From posters and illustrations to diagrams and comics, it can achieve the unity of vision and language. This makes the model more globally applicable, allowing users to create visual content in real-world language environments.

During the live broadcast, Chen Boyuan, a member of OpenAI's image research team, presented a case. He gave the prompt: "Make an artistic marketing poster for a fictional OpenAI bakery. The poster should be in Japanese language."

The resulting poster fully met the prompt and was also precise in details.

"It is very good at following very detailed instructions. So if you have very specific brand language and design aesthetics - all those things that are crucial for creative work - you can use ChatGPT to create and refine your ideas and get the results you want," said Chen Boyuan.

More mature style expression and realism

Images 2.0 significantly improves the fidelity in various visual styles. It is better at capturing the key features of photos, including those tiny imperfections that enhance realism. It can also stably present various visual languages such as cinematic scenes, pixel art, and comics, and is more consistent in texture, lighting, composition, and details.

Therefore, the model output is closer to the specified style rather than an approximate imitation. This is particularly valuable for game prototype design, storyboard production, marketing creativity, and asset creation for specific media or genres.

Flexible aspect ratios

The new model is more flexible in output forms, supporting various aspect ratios from 3:1 to 1:3, which can be directly adapted to different scenarios such as banners, presentations, posters, mobile interfaces, bookmarks, and social media graphics. You can specify the aspect ratio in the prompt or regenerate an existing image to a new size through preset options.

The following shows two examples of non-standard aspect ratios:

Stronger understanding of the real world

Images 2.0 incorporates knowledge up to December 2025, taking the relevance and contextual accuracy of the generated results to the next level. This is particularly crucial for illustrations, educational graphics, and visual summaries, where correctness and clarity are as important as aesthetics.

Its intelligent capabilities are also reflected in end-to-end task processing: integrating information, writing content, and typesetting with a clear structure, reasonable white space, and good visual flow.

Visual thinking partner

After enabling the thinking model in ChatGPT, the system will conduct a more in-depth understanding and execution in the background. It can search for information online, transform uploaded materials into clear visual explanations, and reason about the image structure before generation.

In this mode, Images 2.0 is more like a visual thinking partner, helping you advance preliminary concepts into complete finished products and significantly reducing the workload.

It also supports generating multiple different images at once, which is a first in ChatGPT image generation. This makes workflows such as multi-page comics, whole-house design plans, series of posters, or multilingual and multi-sized social media materials highly efficient and feasible.

You don't need to generate images one by one and then manually splice them. With a single request, you can obtain up to eight outputs that are consistent in characters and elements and have continuity.

Using image generation in Codex

The Images capability is integrated into Codex, enabling visual creation, iteration, and delivery to be completed within the same workspace, expanding its applications in fields such as design, marketing, product, sales, and learning.

For example, you can quickly generate multiple UI directions and prototypes, compare options, and directly transform the best design into a product or web experience without leaving Codex. It can be used with a ChatGPT subscription without an additional API key.

Embedding image capabilities into products through the API

Developers and enterprises can integrate these capabilities into their own products through the gpt-image-2 API, adding high-quality image generation and editing capabilities to existing workflows.

With stronger text rendering, multilingual generation, instruction-following capabilities, and support for more output formats and aspect ratios, the API makes it easier to build image workflows in real business scenarios, such as localized advertisements, infographics, illustrations, educational content, design tools, creative platforms, and web generation products.

Limitations

OpenAI also mentioned the limitations of this model in the blog: Although Images 2.0 is an important improvement, it is still not perfect. For tasks that require complete physical world modeling (such as origami tutorials and complex structures like Rubik's cubes), as well as precise details of hidden, inclined, or reverse surfaces, the model may still perform inadequately.

Extremely high-density or repetitive details (such as fine sand) may also pose challenges. It is still recommended to manually proofread labels and diagrams when they involve precise arrows or component markings.

These are all important directions for future improvement.

In the API, outputs exceeding 2K are currently in the testing phase and may be unstable.

Pricing and availability

ChatGPT Images 2.0 has been open to all ChatGPT and Codex users starting today. Advanced outputs with "thinking" ability are available to ChatGPT Plus, Pro, and Business users.

The gpt-image-2 model is available in the API, and the price varies according to image quality and resolution.

OpenAI has also launched a large number of cases on its official website. Interested readers can visit and check them out on their own.

We also conducted some simple tests. For example, we asked it to generate the second page of a Chinese college entrance examination math paper, and it looked okay:

In the actual test, we can see that generating an image with ChatGPT Images 2.0 usually goes through multiple steps on the page: Create → Make a draft → Generate a first draft → Build a scene → Refine details → Finish up → Final touch-up → Final fine-tuning.

Next, we continued with "Generate a traditional Chinese cursive calligraphy work of 'Jiu Jin Ge' with an aspect ratio of 3:1, containing the full text of Li Bai's 'Jiu Jin Ge'. The signature is ChatGPT Images 2.0":

However,