HomeArticle

Yao Shunyu's debut at Google: The new Gemini model has shattered the SOTA records. Only 7 humans are left defending carbon-based programming.

量子位2026-02-13 15:27
Focus on scientific research and engineering fields.

Facing the fierce attacks of Claude Opus 4.6 and GPT Codex 5.3, Google countered with a major upgrade of Gemini 3 Deep Think.

On Codeforces, a benchmark testing platform with various competitive programming challenges, it achieved an astonishing 3455 Elo score, equivalent to the 8th place in the world.

Now, only 7 people in the world have a higher programming level than it. The previous highest score was 2727 Elo, achieved by o3 a year ago.

The capabilities of Gemini 3 Deep Think go beyond that. It also set a record of 84.6% on ARC - AGI - 2, a leading benchmark recognized for testing AI reasoning ability.

It's worth noting that the scores of previous top - performing models hovered between 60% and 70%, and Claude Opus 4.6 only scored 68.8%.

On the Humanity's Last Exam (HLE), Gemini 3 Deep Think also refreshed the state - of - the - art (SOTA) and achieved a score of 48.4%.

Google officials said that the new version of Deep Think is a reasoning mode specially developed by Google to push the frontiers of intelligence and address modern challenges in science, research, and engineering.

Another "legend" - Shunyu Yao, a legendary winner of the special scholarship from the Department of Physics at Tsinghua University, joined Google DeepMind in September last year and is also involved in the development of this new Deep Think model.

The new version of DeepThink has entered the laboratory

How powerful is the upgraded Gemini 3 Deep Think?

Its ambition is not just to win benchmark tests, but to enter the fields of scientific research and engineering and help engineers handle complex tasks.

The new version of Deep Think can analyze sketches, model complex shapes, and directly generate solid files for 3D printing. Here is a laptop stand it printed:

Google VP Josh Woodward posted the printed result on X, and it looks quite true to the sketch:

Lisa Carbone, a mathematician at Rutgers University, used Gemini 3 Deep Think to review a highly specialized mathematical paper.

As a result, Gemini 3 Deep Think successfully identified a subtle logical flaw that was not detected in previous manual peer - reviews.

The Wang An Laboratory at Duke University used Gemini 3 Deep Think technology to optimize the preparation method for complex crystal growth in order to discover new semiconductor materials.

As a result, Gemini 3 Deep Think successfully designed a process capable of growing thin films with a thickness greater than 100 microns, achieving a precise target that was difficult to reach with previous methods.

On X, XiaoKang Chen, a researcher from the DeepSeek multimodal team, also said that Gemini 3 Deep Think is very good at handling long - tail tasks in the scientific field.

He input a picture of a complex molecular structure into Deep Think, and the model then accurately calculated the molecular formula.

Won three new SOTAs and reduced the reasoning cost by 82%

Last year, the special version of Deep Think won a gold medal in international competitions such as the IMO. Now, the newly upgraded Deep Think has comprehensively refreshed the SOTA in multiple high - difficulty benchmark tests:

Without using any tools, it achieved a new SOTA of 48.4% in the HLE;

It achieved an unprecedented score of 84.6% in the ARC - AGI - 2 test, which was verified by the ARC Prize Foundation;

It achieved an astonishing 3455 Elo score on Codeforces;

It reached the gold - medal level in the 2025 International Mathematical Olympiad.

Among them, ARC - AGI - 2 is known as the "Turing Test" in the AI field, aiming to measure the model's ability to handle novel reasoning tasks it has never seen before.

It's worth noting that the score of the first - generation Deep Think, which was just released in December last year, was 45.1%. In less than three months, it has soared to 84.6%, outperforming Opus 4.6.

On ARC - AGI - 1, Gemini 3 Deep Think achieved a score of 96%, reaching the ceiling.

While the performance is improving, the reasoning cost is also significantly decreasing. The cost of each task for the first - generation Deep Think was $77.16. This upgrade has reduced the cost by 82%, and each task now only costs $13.62.

Since ARC - AGI - 1 and ARC - AGI - 2 have been dominated by Gemini, the ARC Prize is now building ARC - AGI - 3...

In addition to mathematics and programming, the upgraded Deep Think also performs excellently in a wide range of scientific fields such as chemistry and physics.

In the 2025 International Physics Olympiad and Chemistry Olympiad, Gemini 3 Deep Think achieved gold - medal - level results in the written test section.

Moreover, it also demonstrated its ability in advanced theoretical physics, achieving a score of 50.5% in the CMT - Benchmark test.

Led by Chinese researchers, creating the most powerful reasoning model

There are many Chinese faces in the R & D team of Gemini 3 Deep Think.

The core members include Yi Tay, a Chinese scientist born in the 1990s, who conducts research on reinforcement learning and reasoning in the Gemini team.

Previously, he co - led early large - language - model projects at Google Brain, including PaLM - 2, UL2, and Flan - 2.

After working at Google Brain for more than three years, from 2023 to 2024, Yi Tay briefly left Google and founded a unicorn AI startup, Reka, as a co - founder.

Reka AI was founded by researchers from DeepMind, Google, and Meta. Its original intention was to create a powerful and efficient foundation model, and now it also develops tools for interface design, application logic, and other applications.

After one and a half years of entrepreneurship, Yi Tay returned to Google DeepMind as a senior research scientist and continued his research on artificial intelligence and large - language models.

Shunyu Yao, a Tsinghua alumnus who switched from Anthropic to Google DeepMind last year, also participated in the development of the new Deep Think model.

Shunyu Yao studied physics at Tsinghua University as an undergraduate and won the Special Scholarship for Undergraduates at Tsinghua University (the highest scholarship honor awarded to outstanding undergraduates at Tsinghua).

During his undergraduate years, he published a high - level paper in Physical Review Letters (one of the top academic journals in the international physics field), and for the first time in the world, he presented a topological energy - band theory for non - Hermitian systems. He not only accurately predicted relevant phenomena but also defined two new physical concepts.

After graduating from his undergraduate program, he went to Stanford University to pursue a doctorate, focusing on cutting - edge issues such as quantum many - body chaos and the dynamics of open quantum systems. He studied under well - known scholars such as Douglas Stanford (an American theoretical physicist regarded by peers as one of the top young scientists with the potential to change the direction of physics development) and Zhenbin Yang (a Chinese - American scientist recognized as one of the most important physicists in the 20th century).

After obtaining his doctorate, he first did post - doctoral research at UC Berkeley and then joined Anthropic. During his one - year tenure at Anthropic, he participated in the establishment of the reinforcement - learning foundation team, was responsible for the Claude 3.7 Sonnet framework, and the basic reinforcement - learning theory behind the Claude 4 series.

After leaving Anthropic, Shunyu Yao moved to Google DeepMind and continued his research on AI. The release of the new Deep Think model is his debut work at Google.

Reference links:

[1]https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-deep-think/

[2]https://x.com/ShunyuYao14/status/2022013770843967900

[3]https://x.com/YiTayML/status/2021988841142534287

[4]https://x.com/Noam