HomeArticle

Google is closely competing with ByteDance: its latest lightweight model Nano Banana 2 generates images in 4 seconds at a cost of only $0.034 per image, and can also directly output videos

AI前线2026-07-01 17:33
The price is cheaper than that of ByteDance, and both image generation quality and latency have achieved a lead.

The Google Nano Banana 2 Lite has been officially launched. Not only does it initiate a price war in the text-to-image generation field with a slight price advantage over the Seedream 5.0 Lite, but it also directly challenges ByteDance, the top competitor in the multimodal field, with its ability to generate images in just 4 seconds and directly produce videos.

Just now, Google has introduced the lightest "banana" in the Nano Banana family to the forefront.

The Nano Banana 2 Lite has fully landed on Google AI Studio, the Gemini API, and the Gemini Enterprise Agent Platform. Its official call name is gemini-3.1-flash-lite-image. Its core parameters are directly comparable to ByteDance's latest text-to-image model, Seedream 5.1 Lite (released in February 2026):

The cost of generating a single 1K image is reduced to $0.034, and the average image generation speed is only 4 seconds.

This means that in terms of API call costs, the two are almost in close competition:

  • The Nano Banana 2 Lite is priced at $0.034 per 1K resolution image.
  • The Seedream 5.0 Lite is approximately $0.035 per image (equivalent to 0.22 yuan per image through official domestic channels, and $0.035 per image on mainstream third - party API interfaces).

The difference of $0.001 per image is negligible, but in businesses such as content, e - commerce, gaming, education, and advertising, it will be magnified by the number of calls. In tasks such as batch generation, A/B testing, personalized material creation, and real - time preview, latency and cost will both act as "amplifiers".

More interestingly, in terms of performance, the NB Lite 2 has achieved a "double overtaking" of the Seedream 5.0 Lite in two key indicators: "aesthetic preference" in text - to - image generation (based on human blind testing) and end - to - end latency.

Third - party data shows that the Text - to - Image Elo of the Nano Banana 2 Lite is 1251, higher than the 1132 of the Seedream 5.0 Lite; and the latency of the Lite is about 4.0 seconds, while the end - to - end latency of the Seedream 5.0 Lite is as high as 45.1 seconds.

(Note: The latency data is referenced from the AI model evaluation and data platform Artificial Analysis. The measurement is the end - to - end time in the API environment, which may include queuing, service provider encapsulation, and image download.)

Of the two indicators, the former affects the visual experience, and the latter determines the product form: Elo determines "how good - looking the picture is", and latency determines "whether it can be integrated into product interaction".

In other words, the Nano Banana 2 Lite is not just cheaper. Instead, at a similar price, it has improved the quality and response speed of 1K resolution text - to - image generation.

In ByteDance's official statement, there has not been a specific quantitative description of the indicators of the Seedream 5.0 Lite before. It mainly emphasizes "a comprehensive improvement in three major capabilities: cross - modal understanding and reasoning, precise instruction following, and real - time online retrieval, enabling every requirement to be responded to in a timely manner and presented accurately".

As for the Nano Banana 2 Lite, Google defines it as the fastest and most cost - effective image model in the Nano Banana family, targeting scenarios with high throughput, low latency, and large - scale generation.

It can be seen that the NB 2 Lite does not replace the Pro version but fills in the gap for "high - frequency and large - volume image generation". While sacrificing multi - resolution (only 1K) and some heavy - duty capabilities, it focuses all computing power on speed and unit cost, thus addressing the real pain points of "slow and expensive" in the current text - to - image generation scenarios.

In addition, it can seamlessly connect to Google's multimodal Gemini Omni Flash, enabling direct video generation and conversational editing from static images.

4 seconds vs 45 seconds

If only looking at the price tag, it's hard to say that Google's Nano Banana 2 Lite has an overwhelming advantage over ByteDance's Seedream 5.0 Lite.

However, as an American model provider, being able to match the price ($0.034 vs $0.035) is already quite rare. This seems more like Google actively entering the cost - performance battlefield previously dominated by Chinese models.

What really differentiates the two is "production capacity per unit time".

According to third - party sources, the advantage of the NB 2 Lite is not saving $0.001 per image, but compressing the generation experience of 1K text - to - image to 4 seconds at a similar price.

This means that its potential is no longer just that of a simple image generation tool, but it has the opportunity to become part of product interaction and truly integrate into the business process.

Users can change a prompt, switch a style, or adjust a background, and see the result in a few seconds. This "what you see is what you get" instant feedback is more important than just being a little cheaper for design tools, e - commerce back - ends, advertising platforms, social applications, and game UGC.

What lies behind this reflects the differences in business focus between Google and ByteDance.

ByteDance's multimodal advantage is rooted in its powerful content industry chain, especially in short dramas/short videos, e - commerce, and marketing scenarios. Data shows that the penetration rate of Seedance in the domestic AI short drama industry has reached as high as about 95%. Only the 2.0 single - version model can bring more than 1 billion yuan in monthly revenue to Volcengine.

ByteDance's approach is to serve massive content distribution and monetization, getting closer to "blockbuster content".

Google's advantage comes from developer tools, design ecosystems, cloud platforms, and enterprise workflows. Among the customer cases shown on its blog, there are many specialized tool platforms such as Artlist, Figma, and Manus.

Google itself is more inclined to place it in scenarios such as rapid creativity, advertising A/B testing, and social applications for millions of users. It serves as infrastructure and production tools, getting closer to the "production interface".

It is precisely to adapt to these enterprise - level tool scenarios that are extremely sensitive to speed and cost that Google has made extremely radical engineering optimizations in technical implementation.

Compared with the standard and Pro versions of Nano Banana 2, the Lite version has significantly reduced the number of model layers and the computational volume of the attention mechanism and introduced a more targeted inference strategy:

  • Default "Low - Thinking" mode: As officially defined, the Lite version runs in the Low - Thinking mode by default. This means that when generating images, the model skips most of the computational steps for complex logical reasoning and long - chain planning and directly uses the trained latent space mapping for rapid sampling. This is the key to reducing the latency to 4 seconds.
  • Targeted operator optimization: To adapt to high - frequency API calls, the Lite version has performed operator fusion and batch processing optimization on the server side for common 1K resolution image generation requests, greatly improving the utilization rate of the GPU and thus reducing the inference cost per image, which is why it can offer a price of $0.034 per image.

The sweet - spot model for 1K single images

Another easily underestimated indicator of the Nano Banana 2 Lite is the human aesthetic preference score (Elo) for text - to - image generation.

In the blind - testing image generation task, the Nano Banana 2 Lite scored 1251, which is not only higher than the 1132 of the Seedream 5.0 Lite but also even surpasses the Pro version with a larger number of parameters on some benchmarks.

This result breaks the traditional perception that "the number of parameters determines everything" and shows that Google's lightweight model does not simply sacrifice performance for speed but still maintains strong competitiveness in basic visual experience, prompt following, and image completion.

Its core technical logic lies in the combination of knowledge distillation and scenario - specific special training:

Standing on the shoulders of giants: Although the Lite version is small in size, it has a wide range of "knowledge". During training, Google aligned the Lite version with the synthetic data generated by the larger - scale models in the Gemini 3.1 series (such as Ultra or Pro).

This enables the Lite version to inherit the flagship model's understanding of the physical world and the relationships between complex objects, achieving "strong inheritance of world knowledge".

Abandoning comprehensiveness and focusing on high - frequency scenarios: The Lite version does not aim to cover all data but performs refined cleaning and weight enhancement for the most common user prompt scenarios.

This "specialized training" strategy makes it more stable and accurate than a large - scale model trying to cover everything when dealing with common scenarios such as landscapes, portraits, and common objects.

Moreover, Google has also made targeted "reinforcements" for the detail control, which is the most vulnerable aspect of lightweight models.

In the previous lightweighting process, text rendering (OCR) in images and character consistency across images were often the first to be sacrificed. However, the Nano Banana 2 Lite has strengthened these two abilities through a special loss function design:

OCR - level text generation: By introducing an additional text - perception branch, the Lite version can still maintain a high character accuracy rate when generating images containing text, such as posters and UI interfaces.

Feature anchoring mechanism: To solve the problem of "different results for the same subject" in AI - generated images, the Lite version has introduced a more efficient feature anchoring technology to ensure that the facial features and clothing details of the same subject remain highly consistent during multi - round or batch generation.

This is crucial for commercial implementation.

The problem with many lightweight models is that "they are cheap but not reliable" - although the image generation is fast, the detail quality is mediocre, and the money saved on API fees is all spent on manual image screening and re - generation.

The product logic of the Nano Banana 2 Lite is very clear: focus its capabilities on the most common and high - frequency 1K single - image scenarios to ensure that every image is "usable", thus truly achieving the last mile of cost - reduction and efficiency - improvement.

Images are not the end; videos are

When launching the Nano Banana 2 Lite, Google also lifted the restrictions on the multimodal model Gemini Omni Flash. The two play a relay role in Google's ecological landscape:

The Nano Banana 2 Lite is responsible for extremely fast image generation, while the Omni Flash is responsible for video generation and conversational editing.

This combination makes the Lite not just an isolated image - generation tool but the "entrance" of a complete multimedia production chain.

In terms of performance comparison, Google also emphasizes the video - editing capabilities of the Omni Flash.

In the two key dimensions of "Overall Preference" and "Instruction Following", its Elo score ranks first, leading other models such as Alibaba's HappyHorse, Kuaishou's Kling v3 Pro, and ByteDance's Seedance 2.0 (946 and 960).

The integrated ability of "image - to - video generation" of the Omni Flash relies on several key architectural designs in technical implementation.

First, Google has introduced the Interactions API to solve the pain point of "memory loss" in video editing. When you pass the static image generated by the Lite to the Om