HomeArticle

Why did YAO Shunyu exclaim "unstoppable" about the smallest iteration of Google's Gemini?

字母AI2026-02-21 10:24
Be good at handling "super complex tasks"

People are still talking about the comical scene where the bosses of OpenAI and Anthropic refused to shake hands and instead raised their fists. On the other hand, Google has iterated its model.

This iteration is like a wolf in sheep's clothing. Judging from the numerical serial number of the model suffix, this is Google's smallest iteration. Previously, the iterations were from Gemini 2.0 to Gemini 2.5. This time, it's from Gemini 3.0 to Gemini 3.1 Pro Preview.

However, this ".1" iteration has made significant progress.

Google CEO Sundar Pichai said that the new-generation model is very good at handling "super complex tasks", such as visualizing complex concepts, synthesizing data into a single view, or turning creative projects into reality.

Yao Shunyu also posted on X to support Gemini 3.1 Pro Preview and praised it:

"Gemini is not just a good model. An even better model is coming irresistibly."

It should be noted that about a week ago, Google launched the "dedicated inference mode" Gemini 3 Deep Think, which is designed for complex and open-ended questions in science, research, engineering, etc.

Demini 3 Deep Think is the first project Yao Shunyu participated in after switching from Anthropic to Google DeepMind.

Now, Gemini 3.1 Pro Preview is closely related to Gemini 3 Deep Think. The official said it is "directly built on the experience and technology of Gemini 3 Deep Think", which is equivalent to "delegating" the core inference improvement technology of Deep Think to the more widely available Pro model.

01

What can Gemini 3.1 Pro Preview do?

Since the outstanding ability of this new-generation model lies in handling "super complex" tasks, let's put ordinary conversations aside. Google's official blog post focuses on several examples to show its capabilities.

First, create SVG animations through simple prompts.

This function also existed in the previous generation, but the improvement is obvious in comparison.

For example, the prompt "Generate an SVG depicting a chameleon sitting quietly on a branch. Make the chameleon's eyes follow the user's mouse cursor on the screen."

The animation generated by Gemini 3 Pro has a single white background, and the chameleon looks dull, even with two eyes on one side.

The animation generated by Gemini 3.1 Pro has a rich "dark green jungle" background. The chameleon's body is decorated with yellow stripes and dots, its eyes are three-dimensional, and its legs are naturally bent.

Another example is the prompt "Generate an SVG of a sliding switch. When the mouse hovers over the sun icon, turn it into a glowing moon, and at the same time, smoothly fade the background from bright to dark. Adopt a clean and flat UI style."

The animation given by Gemini 3 Pro completes the task, and the icon can change with the mouse. However, the main icon is single, a circular pattern with a missing corner, using yellow to represent daytime and white to represent night.

The animation generated by Gemini 3.1 Pro is much more complex. It shows a yellow sun and white clouds during the day and a crescent moon and stars at night, with a smooth transition between the two sets of icons.

In short, the animations made by Gemini 3 Pro remind people of the meme about "learning animation for three years" from many years ago.

The SVG animations delivered by Gemini 3.1 Pro have reached a level that can be directly used.

Second, build engineering-level systems.

Gemini 3.1 Pro can directly generate a complete interactive system that integrates 3D rendering, real-time solar ephemeris calculation, API asynchronous pulling, and physical lighting effects based on a highly complex natural language instruction, rather than a simple page demo.

In the example given by Google, the user gave a text instruction, and Gemini 3.1 Pro generated a high-fidelity, interactive 3D International Space Station (ISS) orbit tracker. It uses high-resolution Blue Marble texture maps to render a detailed 3D Earth model.

Third, generate interactive creative systems.

In another example, Google demonstrated a complex 3D starling murmuration simulation written by Gemini 3.1 Pro.

It not only generates visual code but also builds an immersive experience. Users can control the flock of birds through hand tracking and listen to the generative soundtrack that changes according to the movement of the birds.

For researchers and designers, this provides a powerful way to prototype sensory-rich interfaces.

Fourth, transform literary themes into runnable code.

This example may be the one that ordinary people can most easily appreciate.

When asked to create a modern personal portfolio website for Emily Brontë's "Wuthering Heights", the model did not simply summarize the text content. Instead, it reasoned based on the atmosphere and emotions of the novel, designed a simple and contemporary interface, and created a website that captures the spiritual core of the protagonist.

There's no need to elaborate on the value of abstract reasoning ability.

02

How powerful is it?

The new-generation model usually has to go through the process of ranking on the leaderboard.

The ".1" upgrade has achieved significant improvements.

According to the test results released in Google's official blog post,

In the ARC-AGI-2 benchmark test, the verification score of 3.1 Pro reached 77.1%, more than doubling the reasoning performance of 3 Pro.

This is also consistent with the examples of 3.1 Pro, as this test evaluates the model's ability to solve new logical patterns. In simple terms, it's the ability to solve puzzles through abstract reasoning.

In addition, in the GPQA Diamond (scientific knowledge test), 3.1 Pro scored 94.3%; on the agent-based benchmark MCP Atlas, it scored 69.2%; on the BrowseComp benchmark for real-world web browsing and information integration ability, it scored 85.9%.

These scores exceed those of Anthropic's Sonnet 4.6, Opus 4.6, and OpenAI's GPT-5.2 and GPT-5.3-Codex.

Google's Gemini 3.1 Pro has significantly outperformed in the ARC abstract reasoning and BrowseComp search tasks, showing an obvious agent tendency rather than just being a knowledge model.

In addition, a third-party evaluation institution specializing in large model benchmark testing and comparative analysis has also released relevant test results, highly praising Gemini 3.1 Pro for leading in 6 out of 10 evaluations that make up the Artificial Analysis Intelligence Index. Compared with Gemini 3 Pro Preview, it has significantly improved in multiple capabilities, especially in reasoning, knowledge, code ability, and reducing hallucinations.

Moreover, Gemini 3.1 Pro Preview maintains a high token efficiency.

Running the complete Artificial Analysis Intelligence Index requires about 57 million tokens (1 million more than Gemini 3 Pro Preview).

This token usage is lower than that of other cutting-edge models running in the maximum inference mode, such as Opus 4.6 (max) and GPT-5.2 (xhigh).

Combined with the lower single-token pricing, Gemini 3.1 Pro Preview has a cost advantage among cutting-edge models. The cost of running the complete Intelligence Index is less than half of that of Opus 4.6 (max), but still about twice that of the leading open-source model GLM-5.

03

Double the ability, same price

Google's official API pricing shows that the charging structure for Gemini 3 Pro/3.1 Pro Preview is based on tokens:

When the number of tokens is less than 200k, the input cost is about $2 per million tokens, and the output cost is $4. When the number of tokens is greater than 200k, the input cost is $4 per million tokens, and the output cost is $18.

In terms of context caching, depending on the scale of the prompt, a fee of $0.20 to $0.40 per million tokens is charged, plus a storage fee of $4.50 per million tokens per hour.

Overall, this price is the same as that of Gemini's previous generation 3 Pro. However, compared with the Anthropic Opus series, it is relatively cheaper. The input/output unit price of Opus models can be around $5/$25.

Especially considering its outstanding model capabilities at present, this price is even more competitive.

Don't forget, what Google released this time is just a "Preview". Google will soon launch the official version. And the ".1" iteration is just a small demonstration of its capabilities.

Currently, developers can use 3.1 Pro in AI Studio, Gemini API, Gemini CLI, the agent development platform Google Antigravity, and Android Studio; enterprise users can use it in Vertex AI and Gemini Enterprise; ordinary users can use it in the Gemini app and NotebookLM, but the latter is only available to Pro and Ultra subscribers.

Many people in various communities can't wait to try it out. Just like Google's demonstration, they have created many amazing things.

Someone used Gemini 3.1 Pro to generate an interactive 3D mechanical-level car suspension system simulator, which includes real geometric structures, link constraints, and real-time steering and stroke calculations. It's equivalent to writing a runnable tool for mechanical engineering modeling, physical logic, and 3D visualization all at once, approaching engineering-level prototype capabilities.

Someone used 3.1 Pro to create a loop animation of a "ghost hunter walking through a haunted house" and exclaimed, "Gemini is not joking."

In short, Google has really come up with a big move this time.

The small ".1" iteration has significantly improved the reasoning and code abilities, and the pricing is still stable.

The enthusiasm in the community for creating demos also proves its capabilities and practicality.

The AI circle is becoming more and more practical. No matter how powerful a model is, it ultimately depends on whether it's worth the cost. Enterprises are starting to calculate the return on each token, and developers also need