Google's Nano Banana 2: Overcome Shortcomings Overnight, Draw Various Diagrams, Only Half the Price of OpenAI's

Text-to-image generation can now connect to the Internet.

Late at night, there was a big splash! Google has once again shaken up the text-to-image generation circle. Nano Banana 2 suddenly went live and immediately topped the list.

This time, Nano Banana 2 focuses on "ultra-fast experience" + "professional image quality". However, what truly sets it apart is a new ability - "real-time internet connection".

Put simply, this is no longer just a "drawing-only" model. It is connected to the entire search capabilities of Gemini, which is equivalent to equipping the image model with a "brain" that can search for information.

When the model can understand, retrieve, and generate simultaneously, the images are not just "good-looking", but more in line with the information structure of the real world.

For example, the street scene generated from a single sentence has so many details that you can zoom in to see the signs. The billboards in the distance, road signs, and window displays all look like real photos.

Another example is asking "Chopping Wood Guy" to hand you a cigarette. The character's expression, body language, and the environmental lighting are all on point. If not told, it's hard to tell at a glance that it's AI-generated.

Chopping Wood Guy also personally promoted it and mentioned the gameplay of the "window seat". With just one sentence, whether it's a bustling city night view or a log cabin in the snowy wilderness, you can accurately generate a "window view" composition. Each frame is based on real geographical and meteorological information, clearly demonstrating how powerful the "real-time internet connection" ability is.

However, "looking realistic" is just the first step. More importantly, it opens up a new direction of "information graph generation", which is very practical.

Some time ago, there was a very popular model meme:

I want to wash my car. The car wash is 50 meters away from me. Should I walk there or drive there?

Many top models failed and gave the answer of "walking is more environmentally friendly". What's the problem? They only analyzed the "50 meters" but ignored the "goal of washing the car".

Google directly generated a graph comparing the logical chains of "walking" and "driving" and gave the correct conclusion, which not only demonstrated Gemini's powerful thinking ability but also showed Banana2's first-class drawing ability. Netizens said it was a "silent show-off".

In the eyes of many netizens, image generation seems to have taken another step forward, bridging the gap with the real world.

However, some netizens have expressed deep concerns. As it becomes increasingly difficult to distinguish between real and fake images, will AI fraud become more prevalent?

In response, Google's solution is "traceability". The content generated by Nano Banana 2 will be overlaid with a SynthID watermark and combined with the C2PA content credential system to facilitate platforms to identify the source.

Currently, the competition in text-to-image generation has entered a fierce stage. In the authoritative image model evaluation list of Artificial Analysis, it can be seen that two of the top three spots are occupied by the Nano Banana series. Among them, Nano Banana 2 ranks first, with its image editing ability ranking third, but its price is only half of that of the second-place OpenAI, making it the "king of cost-effectiveness".

However, judging from the scores, the gap between the top models is actually very small. The industry has entered a stage of close competition.

Google disclosed last month that the monthly active users of the Gemini application reached 650 million. Official executives also admitted that the "viral spread" of Nano Banana is one of the important reasons for the growth.

The competition in text-to-image generation is no longer just about the image quality, but also about speed, understanding ability, and ecological integration.

Netizens are having a great time. What difference does "real-time internet connection" bring to text-to-image generation?

Whether it's good or not, try it first. Netizens started evaluating it from various angles.

Someone used it to test a bracelet image for a visual design scheme, and the result shocked him. He even exclaimed that "design is dead".

Someone said it is the best image model in the world, and the details of the generated pictures can be almost indistinguishable from real ones.

Someone exclaimed that even the text on each card in the picture was accurate.

Someone simply used it to generate inscriptions. It was fast and good, and the effect was amazing.

Some netizens believe that this time, Nanana2 has extremely high controllability. The details of the characters match the desired effect and are very realistic.

Moreover, no matter how the characters change, they won't be deformed.

The overall visual effect is also more "less AI-like".

Making picture books is also a piece of cake.

People seem to be convinced by Banana2.

In many evaluations, people are also very concerned about the upgrade of the new "real-time internet connection" function. What's the difference between the images that can be "connected to the internet in real-time" and the previously generated images? How powerful is it, and how practical is it?

Let's first look at the official example. Banana 2 generated a water cycle schematic diagram with a "handmade style": cotton for clouds, paper for mountains, and a glass bowl for seawater. The texture and details are in place. More importantly, it not only has good understanding ability and clearly explains the complete link of evaporation, condensation, precipitation, and collection, but also all the text labels are accurate, the corresponding relationships are clear, and there are no logical jumps.

Some netizens also used it to make a recipe, and the effect was equally amazing: the layout, partition, and step structure are all like professional design drafts. She said bluntly that people underestimated the "visualization ability" of Nano Banana 2, which will revolutionize the field of information graphics.

More detailed recipe diagrams and science popularization diagrams were also posted one after another.

Even when used to make medical anatomical diagrams, it is quite powerful. Hand-drawn sketches can be quickly transformed into professional science popularization diagrams.

This ability to visualize abstract concepts is unleashing greater imagination for text-to-image generation. It is no longer just about "generating beautiful pictures", but has begun to play the role of knowledge organization

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Google's Nano Banana 2 makes up for its shortcomings overnight. It can draw all kinds of diagrams, and its price is only half of OpenAI's.

Netizens are having a great time. What difference does "real-time internet connection" bring to text-to-image generation?