HomeArticle

After a deep experience with Google Nano Banana, we discovered its two sides.

智百道2025-09-15 09:53
What did Google's Gemini image model do right when the "Banana" storm hit?

Within less than two weeks of its launch, Google's Nano Banana has generated over 200 million images globally, with users in the Asia - Pacific region showing the highest enthusiasm.

This "rising star" in the field of image - editing models was just a mysterious code of unknown origin in the global artificial intelligence community last month. On the anonymous AI model battle platform LMArena, it quickly topped the leaderboard with astonishing performance. Its ability to handle complex instructions, maintain character consistency, and understand context details easily defeated all well - known opponents, including OpenAI and Midjourney. For a time, speculations about what "Nano Banana" really was were rife.

The mystery was soon solved. Google officially announced that this dark horse was its newly upgraded image generation and editing model - Gemini 2.5 Flash Image. It was integrated into Google's AI application Gemini as a major update and is technically supported by Google DeepMind.

According to "Zhibaidao", the emergence of "Nano Banana" is not just another iteration of the image model. It indicates that Google is trying to transform AI into a "creative collaborator" deeply embedded in the workflow, aiming to break the current binary pattern between the artistic aesthetics dominated by Midjourney and the text productivity tools dominated by OpenAI in the market, and open up a new track centered on the "workflow".

01 Redefining "Photo Editing", Editing Reality Like Having a Conversation

The interaction mode of traditional AI image tools is often a "question - answer" one. Users need to rack their brains to design the perfect prompt, and the model generates the result all at once. Subsequent modifications, whether through Midjourney's "Vary" function or DALL - E's local redrawing, feel like independent and discrete operations.

"Nano Banana" introduces a new mode of a "creative partner". Users can initiate an initial instruction and then iteratively optimize the generated image through continuous natural - language conversations. This multi - round editing ability enables the AI to remember the context, understand the user's continuous intentions, and thus achieve progressive and refined adjustments.

"Zhibaidao" tried to let the model generate an "empty room", then said "paint the walls light yellow", followed by "add a bookshelf by the wall", and finally "place a chandelier, a sofa, and a carpet". Throughout the process, "Nano Banana" always maintained an overall understanding of the scene, and each modification was based on the previous one rather than starting from scratch.

According to "Zhibaidao", this interaction mode greatly lowers the usage threshold, allowing complex visual concepts to be gradually realized through the most natural form of conversation. It transforms the user's role from a "prompt engineer" to a real "creative director". The user's value lies not only in putting forward the initial concept but also in polishing and perfecting the final work through continuous interaction with the AI, which is closer to the natural thinking process of human creators.

Behind the conversational experience lies the model's four core technologies, which together form the disruptive ability matrix of "Nano Banana".

First is character and style consistency. Previous models had difficulty maintaining the same character's facial features, clothing, or specific style in multiple images. "Nano Banana" has made a breakthrough in this regard, ensuring that a person, a pet, or even a brand product maintains the coherence of its core appearance in different scenarios, postures, and clothing.

Second is multi - image fusion. This function allows users to upload multiple different images, enabling the model to understand and seamlessly integrate the elements, subjects, or styles into a new and logically consistent scene.

Third is precise local editing. Users don't need to use complex selection or masking tools. They can modify specific areas of the image simply through a simple text description. Whether it's "remove the stain on the T - shirt", "blur the background of the photo", or "change the person's pose", the model can accurately locate and execute the operation while maintaining the integrity and coordination of other parts of the image.

Finally is design and style transfer. The model can extract design elements such as colors, textures, or patterns from one image and apply them to objects in another image. Examples demonstrated by Google officially include "design a pair of rain boots with the colors and textures of petals" or "design a dress with the patterns of butterfly wings", showing its potential in cross - concept creative combinations.

As some technology media have commented, "Nano Banana" is becoming "everyone's Photoshop". It transforms professional image - processing techniques that used to take years to master into tools that ordinary people can use through daily language. For the general public, this means they can easily create more personalized content for social media, produce unique visual materials for personal projects, or simply realize all kinds of wild imaginations for entertainment.

For professional creators such as graphic designers, illustrators, and visual artists, "Nano Banana" can free them from a large number of repetitive and cumbersome execution tasks. For example, creating 15 slightly different - sized versions for an advertising campaign or changing different backgrounds for a series of product images, which used to be time - consuming and labor - intensive tasks, can now be automatically completed by the AI. This allows professionals to devote more energy to higher - level brand strategies, complex layout designs, and the final refinement of details that determine the quality of the work.

This model has also quickly been recognized by professionals. Daniel Barak, the global head of creativity and innovation at WPP, the world's largest advertising and communications group, pointed out that the model has shown powerful application examples in the retail and consumer goods industries and plans to integrate it into WPP's AI marketing service platform, WPP Open.

02 What Did Google Do Right?

Before its official identity was revealed, "Nano Banana" had already proven itself on the anonymous LMArena battle platform. In the human preference test, especially in image - editing tasks, it ranked first with an Elo score of up to 1362, significantly leading its competitors.

In addition to the technological innovation of the model itself, Google has also skillfully utilized the advantages of its large ecosystem. "Nano Banana" inherits the "native world knowledge" of the Gemini large model, which means it is not just an image generator but also a system with common sense and reasoning ability. It can understand and generate images with deep semantic accuracy. For example, it can read hand - drawn charts and answer related questions, or generate pictures that conform to local cultural habits based on the user's geographical location.

In terms of business strategy, Google has adopted a highly competitive pricing strategy. Through API calls, the cost of generating each image is about $0.039. This low - price strategy has greatly lowered the threshold for developers and enterprises to conduct large - scale and high - frequency image generation. According to "Zhibaidao", this is a typical platform strategy aiming to quickly seize market share through price advantages and encourage developers to build an application ecosystem around its API.

Google's strategy is also clear. It doesn't aim to be the best in all aspects. Midjourney is still the king in artistic aesthetics, and OpenAI has an advantage in general - purpose applications due to its large user base of ChatGPT. Google has chosen the workflow as its breakthrough point. By creating a tool that performs excellently in 80% of the tasks commonly encountered by professionals (such as maintaining consistency, making repeated modifications, and quickly generating images) at a low cost, it has accurately entered the enterprise - level market with extremely high requirements for practicality and integration.

This is a typical strategy of using a "more user - friendly and cheaper" product to meet the needs of the mainstream market. Even if it is not the best in some top - level artistic indicators, its comprehensive value in commercial applications may be higher.

03 The "Other Side" of the Banana, Imperfect Reality and Unsolved Ethical Questions

Although "Nano Banana" has brought many breakthroughs in function and concept, it is far from perfect. The actual user experience and in - depth examination have revealed a series of technical shortcomings.

First is the loss of resolution and details. A review by technology media CNET pointed out that after processing high - quality photos uploaded by users, the resolution of the output images of this model often decreases, causing the fine details in the original photos to become blurred. This is an unacceptable flaw for photographers and professional designers who pursue image quality.

Second is the rigid format limitation. Currently, the model forces the output of square (1:1) images and ignores the user's instructions to change the aspect ratio. This limitation greatly restricts its application in different media. Although some advanced users have found a "hacker" method to "deceive" the model into outputting images with different ratios through specific instructions, this undoubtedly increases the usage cost and uncertainty.

In addition, its performance is not stable. In some seemingly simple tasks, such as removing reflections on glass, the model may fail repeatedly, and each attempt may further reduce the image quality and even distort the faces in the picture. Some Reddit users even complained that the publicly released version seems to have lower performance than the anonymous version tested on LMArena before, with a discount in consistency and instruction - following.

Notably, to avoid getting involved in security and ethical disputes, the new version of "Nano Banana" seems to have gone to the other extreme: over - censorship. Many users have reported that the model has an extremely strict built - in security filter and often refuses to execute completely harmless instructions that comply with community norms. This strategy of "better kill a thousand by mistake than let one go" avoids the risk of political incorrectness to a certain extent.

In addition, all images generated or edited by "Nano Banana" will be marked with a visible watermark and an invisible digital watermark called SynthID. This technology developed by Google DeepMind aims to clearly identify the AI - generated nature of the content from the source to help combat false information and malicious abuse.

Recently, Google also announced the specific usage restrictions for different levels of Gemini services. Free users can generate 100 images per day, Google AI Pro subscribers can generate 1000 images per day, and Google AI Ultra subscribers can also generate 1000 images per day but enjoy higher quotas for other Gemini functions.

The release of "Nano Banana" also brings a profound question about the future: Is this the "iPhone moment" marking a new era of human - machine interaction, or just another intensifying arms race among technology giants?

According to "Zhibaidao", in terms of its core contribution, its real breakthrough lies in shifting the interaction paradigm of visual creation from "writing instructions" to "having a conversation". This workflow - centered model that emphasizes iteration and refinement is undoubtedly closer to the natural creative thinking of humans than any previous tool. Just as the multi - touch technology of the iPhone made complex calculations intuitive and easy to use, the conversational editing of "Nano Banana" has greatly lowered the threshold for advanced visual creation and changed the collaborative relationship between humans and AI.

However, this innovation must also be placed in the highly competitive landscape of the generative AI field. Google has regained an advantage in image editing and workflow integration with "Nano Banana", but its competitors are not standing still. OpenAI is continuing to deeply integrate its image - generating ability into the large ecosystem of ChatGPT, and Midjourney still leads the way in the artistic stylization track.

The long - term significance of "Nano Banana" may not lie in whether it is the "best" model at present, but in the strategic direction it represents - using AI as a seamless, intuitive, and deeply embedded collaborative tool in the daily workflow. After this storm, the landscape of the creative industry has changed. It has accelerated the democratization of creativity and reshaped the role of professionals.

Google's "banana" may not be the end of the war, but it is undoubtedly the signal flare that has changed the rules of the battlefield. The era of creative workers coexisting with AI "co - pilots" has arrived.

This article is from the WeChat official account "Zhibaidao". Author: Daoge. Republished by 36Kr with permission.