Google's Gemini 3.1 Pro, the new king, has arrived. It created the Windows 11 operating system in one go, developed a SimCity app, and the SVG effect is amazing.
On February 20th, Zhidx reported that early this morning, Google officially released its new - generation flagship model Gemini 3.1 Pro. According to the benchmark tests released by Google, Google's most powerful model for handling complex tasks, Gemini 3.1 Pro, outperformed models such as Gemini 3 Pro, Claude Opus 4.6, Claude Sonnet 4.6, and GPT - 5.2 in 12 tests and took the first place.
Google DeepMind mainly improved the reasoning ability of Gemini 3.1 Pro. Facing the ARC - AGI - 2 general intelligence benchmark test, which is recognized as highly difficult in the industry, Gemini 3.1 Pro achieved a high score of 77.1%, surpassing the Claude and GPT models, and its score was doubled compared to Gemini 3 Pro.
Shunyu Yao, a legendary figure from the Department of Physics at Tsinghua University who joined Google DeepMind last September, also posted an official announcement about the release of the new model, saying "Better Gemini models are emerging at an irresistible pace."
The following classic comparison of the "pelican riding a bicycle SVG animation" intuitively reflects the improvement in the capabilities of the new model. The pelican's body structure and riding posture generated by Gemini 3.1 Pro on the right are natural and reasonable, and the details of the bicycle frame, chain, pedals, seat, etc. are complete. Compared with the output of Gemini 3 Pro, it conforms to physical common sense and is more like a complete animation scene.
Jiao Sun, an alumnus of Tsinghua University who developed the SVG generation function for Gemini 3.1, commented on X, saying "I'm extremely proud."
Starting today, Google AI Pro and Ultra subscribers can use Gemini 3.1 Pro in the Gemini app and the AI assistant NotebookLM. Free users can ask questions to Gemini 3.1 Pro twice. Developers and enterprise users can use Gemini 3.1 Pro in the AI Studio, Antigravity, Vertex AI, Gemini Enterprise, Gemini CLI, and the preview version of the Gemini API in Android Studio.
The API price of the Gemini 3.1 Pro preview version adopts a tiered billing model, which is consistent with the previous - generation Gemini 3 Pro preview version. For prompts within 200,000 tokens, the input price per million tokens is $2 (approximately RMB 14), and the output price is $12 (approximately RMB 83). If the prompt exceeds 200,000 tokens, the input price per million tokens is $4 (approximately RMB 28), and the output price is $18 (approximately RMB 124).
01 .
Can Build WebOS, Create "Minecraft",
And Deconstruct Visual Illusions
The core upgrade of Gemini 3.1 Pro focuses on its ability to handle complex tasks. Its blog revealed that the new model is further strengthened in advanced reasoning, multimodal understanding, and complex project generation, and can better handle high - difficulty work scenarios. After the model was released, community tests quickly followed.
Well - known AI blogger Chetaslua showed the result of using Gemini 3.1 Pro to install Windows 11 WebOS at one time.
Chetaslua said directly in the post: "The last time I shared a similar case, it was very difficult. Now it has become the norm. With the agent system, we can almost do anything with this model."
He also previously posted a video of using Gemini 3.0 Pro to generate the Windows Web operating system. When comparing the two videos, the improvement is very obvious.
The system interface generated by Gemini 3.1 Pro has complete application icons, a well - structured start menu layout, and basic window interaction logic. Its overall form is closer to a runnable lightweight operating system.
In contrast, the system form generated by the previous 3.0 Pro was relatively rudimentary, lacking some basic desktop interactions and system - level applications.
Another set of more engineering - oriented cases shows that a developer used Gemini 3.1 Pro to directly generate and run an interactive VoxelWeb project in the browser, similar in form to a "Minecraft" - style 3D sandbox.
The interface already includes a start button, movement controls, block interactions, and basic synthesis logic, with a complete prototype of a lightweight sandbox.
In terms of front - end generation and animation details, a developer also asked the model to generate a complete interactive growth animation covering the entire process from seed germination, root system formation, stem growth to leaf unfolding.
The test results show that the model performs relatively well in connecting growth stages and presenting leaf details. The developer commented: "This is the best leaf effect I've ever seen with this prompt."
The tests in the direction of visual understanding further increased the difficulty. A netizen specifically verified the "AgenticVision" ability by inputting a seemingly ordinary photo of a street trash can.
The model not only completed the basic recognition but also further pointed out that when squinting or looking from a distance, the garbage, shadows, and outlines in the picture would visually form two cartoon characters sitting side by side. The model also disassembled the formation mechanism of this visual illusion item by item, explaining the relationship between different fabrics, garbage bags, and shadows corresponding to the characters' heads, bodies, and outer outlines, demonstrating multi - step visual reasoning ability.
Overall, Gemini 3.1 Pro has begun to touch on higher - order visual cognitive tasks such as understanding spatial relationships, shape mapping, and explaining visual illusions. Developers' comprehensive judgment is that its performance has reached the level of the current first - tier.
We also tested Gemini 3.1 Pro with some tricky questions such as "Whether to drive or walk to a car wash 100 meters away" and "Whether parents can get married". As a result, it successfully avoided the pitfalls and answered correctly.
02 .
Create "SimCity" from Scratch
Handle Creative Programming and Interactive Design in Minutes
Google DeepMind's official X account showed that Google UX engineer Michael Chang used Gemini 3.1 Pro to develop a realistic city planning application. Gemini 3.1 Pro can handle complex terrains on its own, draw infrastructure maps, simulate traffic, and finally generate high - quality visual effects.
In addition to the above - mentioned pelican riding a bicycle, Gemini 3.1 Pro also performs excellently in generating SVG animations of various abstract scenes, such as a frog riding an old - fashioned high - wheel bicycle, a giraffe driving a micro - car, and an ostrich wearing roller skates. Compared with Gemini 3 Pro, the overall scene of the output generated by Gemini 3.1 Pro is more vivid and story - like, and the detail performance is significantly improved.
For example, Gemini 3.1 Pro can directly generate animated SVGs for websites based on text prompts. Since these animations are constructed with pure code rather than pixels, they can remain clear at any size, and the file size is very small compared to traditional videos.
The complex reasoning ability of Gemini 3.1 Pro can help users complete designs using complex APIs. In the following case, the model built a real - time aerospace dashboard and successfully configured the public telemetry data stream to visualize the orbital trajectory of the International Space Station.
In terms of interactive design, Gemini 3.1 Pro can write code to generate a complex 3D starling flock flight simulation. It can also create an immersive experience. Users can control the flock of birds through gesture tracking while listening to a generative soundtrack that changes with the dynamics of the flock.
Gemini 3.1 Pro can also perform creative programming, converting literary themes into runnable code. When asked to build a modern personal portfolio website for "Wuthering Heights" by Emily Brontë, the model deeply analyzed the atmosphere and tone of the novel, designed a simple and modern interface, and created a website that captures the spiritual core of the protagonist.
03 .
Excellent in Programming, Reasoning, and Multimodality
Outperforms Claude and GPT Models in Several Tests
Researchers evaluated Gemini 3.1 Pro in a series of benchmark tests, including reasoning, multimodal ability, agent tool use, multilingual performance, and long - context handling.
Compared with Gemini 3 Pro, Claude Sonnet 4.6, Claude Opus 4.6, GPT - 5.2, and GPT - 5.3 - Codex, Gemini 3.1 Pro took the first place in 12 benchmark tests.
In tests that require stronger reasoning ability, Gemini 3.1 Pro outperformed the Claude and GPT models in three tests: the Human Final Exam, ARC - AGI - 2, and GPQA Diamond.
In the programming ability test, Gemini 3.1 Pro scored relatively low in SWE - Bench Pro (public version) and SWE - Bench Verified. These two test sets examine the model's end - to - end engineering ability to understand requirements, locate problems, modify code, and ensure usability in real projects.
GDPval - AA Elo is