Gemini 3 Flash has pulled off a stunning reversal: its key performance actually surpasses that of Pro.
On December 17th, Google officially released Gemini 3 Flash. A "lightweight model" priced at only 1/5 of Claude and 1/4 of GPT, it outperforms Claude Sonnet 4.5 in coding, comprehensively crushes it in reasoning and multimodality, and has wins and losses against GPT-5.2.
Multimodal evaluation results of MMMU-Pro:
Even more astonishing is that it even surpasses its own flagship: on the SWE-bench, Gemini 3 Flash scores 78%, while Gemini 3 Pro scores 76.2%. This is also the first time since the birth of the Flash series that it has exceeded the contemporary Pro model.
The data might still be a bit abstract. Let's directly see what it can do:
Gemini 3 Flash can generate a complete animated 3D procedural room in a single attempt
Generate a playable game with a single sentence using Gemini 3 Flash
Resemble AI uses Gemini 3 Flash to analyze deepfake videos in real-time: their product needs to instantly convert complex audio-visual forensics data into analysis results that ordinary people can understand. In the test, they found that the multimodal analysis speed of 3 Flash is 4 times faster than that of 2.5 Pro, and it can process raw technical output data without slowing down the key workflow
A month ago, the release of Gemini 3 Pro and Deep Think allowed Google to return to the first echelon of AI. Gemini 3 Pro topped the LMArena, and Deep Think achieved three times the score of other models on ARC-AGI. Since its release, the daily average processing volume of the Gemini API has exceeded 1 trillion tokens. Now, the arrival of Flash fills the last piece of the puzzle for the Gemini 3 family.
But this time, Gemini 3 Flash is different from the past. In the past, our understanding of Flash was clear: fast, cheap, but with compromised capabilities. If you wanted speed, you had to accept the compromise in intelligence. However, Gemini 3 Flash breaks this convention. It offers flagship-level capabilities at the price of a lightweight model.
Why can it compete with flagships at 1/5 of the price?
Let's first compare it with other models.
On the doctoral-level scientific reasoning benchmark GPQA Diamond, Gemini 3 Flash scored 90.4%, significantly leading Claude Sonnet 4.5's 83.4% and approaching GPT-5.2's 92.4%. On the multimodal understanding benchmark MMMU-Pro, Flash scored 81.2%, exceeding GPT-5.2's 79.5% and leaving Claude Sonnet 4.5 more than a dozen percentage points behind.
On the Humanity's Last Exam (without tools), Gemini 3 Flash scored 33.7%, while Claude Sonnet 4.5 only scored 13.7% - a gap of nearly 20 percentage points.
Its coding ability is also remarkable. On the SWE-bench Verified, Gemini 3 Flash scored 78%, exceeding Claude Sonnet 4.5's 77.2% and for the first time surpassing its own 3 Pro's 76.2%.
Considering the price factor, it's even more astonishing. The price of Flash is about 1/5 of Claude and 1/4 of GPT, but it ties or leads in multiple indicators. If in the past choosing Flash meant fast and cheap but with compromised capabilities, now choosing Flash means saving money and getting excellent performance.
Then a question naturally arises: what's the use of Gemini 3 Pro?
In extreme reasoning scenarios. On GPQA Diamond, Pro scores 91.9% while Flash scores 90.4%. On Humanity's Last Exam, Pro scores 37.5% while Flash scores 33.7%. Coupled with the Deep Think mode that only Gemini 3 Pro has. Pro is suitable for extreme reasoning, and Flash is suitable for high-frequency agent tasks. This is the new division of labor given by Google.
But for most scenarios, Flash is not only sufficient but also has an extremely high cost-performance ratio.
The efficiency improvement is also significant. According to the test by Artificial Analysis, Gemini 3 Flash is 3 times faster than 2.5 Pro and saves an average of 30% token consumption in daily tasks. In terms of pricing, the input is $0.50 per million tokens, and the output is $3 per million tokens, only a quarter of 3 Pro.
Gemini 3 Flash advances the optimal boundary in the trade-off relationship between performance, cost, and speed
Google's official statement is: "Speed and scale don't have to come at the cost of intelligence." In the past, this might have been just a slogan for Flash, but this time the data really supports it.
Flagship experience for free users
The release of Gemini 3 Flash is not just about the API. It will directly change the daily experience of ordinary users.
In the Gemini App, Gemini 3 Flash will replace the original 2.5 Flash and become the new default model. This means that all Gemini users worldwide, including free users, will automatically upgrade to the Gemini 3-level experience without paying or making any settings.
The upgraded App will offer three modes for users to choose from:
- Fast: By default, it is powered by Gemini 3 Flash and can quickly answer daily questions.
- Thinking: Also powered by Flash 3, but it activates its "deep thinking" ability and is specially designed to handle complex logic.
- Pro: It continues to use Gemini 3 Pro and is the first choice for handling high-difficulty math and code problems.
In Google Search, the default model of AI Mode will also be upgraded to 3 Flash globally. Google said that thanks to the powerful reasoning and multimodal capabilities of 3 Flash, AI Mode can now understand user intentions more accurately, handle more complex problems with multiple constraints, and generate clear and easy-to-digest answers.
For users in the United States, Google has also opened more options. They can choose "Thinking with 3 Pro" in AI Mode to get more in-depth help, and the image generation model Nano Banana Pro is also open to more US users.
For ordinary users, this might be the most noticeable upgrade. When you open Gemini, it's already a cutting-edge model; when you ask complex questions in Google Search, there's a top-level large model engine working behind the scenes. In other words, the default model used by free users now has capabilities comparable to the paid flagships of other companies.
Developers: Save money and gain more
In the past, when developing agentic applications, if you wanted to use a flagship-level model, you had to pay a flagship-level price. Gemini 3 Flash has changed this situation.
In the past, developers faced a dilemma. They could either use small, fast but less intelligent models and sacrifice the quality of task completion, or use large, intelligent but slow and expensive models and face the dual pressures of latency and cost. Especially in agent scenarios that require multiple rounds of calls and high-frequency iterations, this trade-off was almost inevitable. Gemini 3 Flash provides a new option: fast enough, smart enough, and with controllable costs. Its 78% score on the SWE-bench shows that it is fully capable of handling complex coding tasks. At the same time, its speed, which is 3 times that of 2.5 Pro, makes it suitable for real-time scenarios sensitive to latency, and its price, which is 1/5 of its competitors, makes large-scale deployment possible.
Currently, Gemini 3 Flash has been launched in preview on the following platforms:
- Google AI Studio and Gemini API
- Gemini CLI
- Android Studio
- Vertex AI (for enterprises)
- Google Antigravity: This is a newly launched agentic development platform by Google, specifically designed for AI-driven software development processes, allowing AI Agents to directly operate editors, terminals, and browsers
In high-frequency call scenarios, Google also provides a supporting cost optimization plan. The Context Caching function can reduce costs by 90% when the repeated use of tokens reaches a certain threshold; the Batch API supports asynchronous batch processing, which can further reduce costs by 50% and provide a higher call quota. For teams that need to run large-scale agent tasks in a production environment, this combination is quite attractive.
The meaning of Flash has changed
The release of Gemini 3 Flash, to some extent, redefines the meaning of the "Flash" category.
Flash is no longer just about speed and efficiency.
In the past, the positioning of Flash, or lightweight models, was very clear: sacrificing capabilities for speed and cost advantages. Choosing Flash meant accepting the compromise in intelligence. However, Gemini 3 Flash proves another possibility. When the underlying foundation model is strong enough, the lightweight version doesn't have to sacrifice too many capabilities. It can just be a "more efficient fully-equipped version."
Google mentioned in its blog that the core model capabilities of Gemini 3 Flash have reached such a high level that in many tasks, 3 Flash with the thinking mode turned off performs better than the 2.5 version with the thinking mode turned on. In the past, you had to sacrifice speed for accuracy. Now, you don't have to.
This release also officially completes the lineup of the Gemini 3 family: Gemini 3 Pro, Gemini 3 Deep Think, and Gemini 3 Flash, covering the full range of needs from casual users to hardcore developers. If you want extreme reasoning depth, use Deep Think; if you want the strongest comprehensive capabilities, use Pro; if you want something fast, good, and cheap, use Flash. Everyone can choose according to their needs, and it's no longer a single-choice question.
Judging from the data, Google is making steady progress in AI productization. The monthly active users of the Gemini App have exceeded 650 million, the number of developers has reached 13 million, and the API call volume has increased threefold year-on-year. From the last quarter to this quarter, the number of users has soared from 450 million to 650 million.
Currently, ordinary users can directly experience the new model in the Gemini App and Google Search's AI Mode; developers can start building applications through Google AI Studio and the Gemini API.
When Google offers flagship-level capabilities at 1/5 of the price of the Flash model, it completely opens up the imagination space for the Flash category.
This article is from the WeChat public account "Silicon Star People Pro", author: Zhou Yixiao. Republished by 36Kr with permission.