StartseiteArtikel

Just now, the most powerful AI of the year has made its debut. Elon Musk and Sam Altman have given their thumbs up to Gemini 3. After experiencing it, I found that ChatGPT should be worried.

爱范儿2025-11-19 07:59
A new king ascends the throne, and everyone praises him.

Just now, the preview version of Gemini 3 Pro was officially released.

The AI circle at the end of the year has always been full of news, but this year it's particularly noisy. If nothing unexpected happens, this will be the most anticipated overseas large model making its grand finale appearance in 2025. One could even say that Gemini 3 Pro has become the sole protagonist in this time window.

In the past two months, Google almost replicated Sam Altman's marketing approach. From Logan Kilpatrick, the well - known promotional commissioner for Gemini, to CEO Sundar Pichai, internal staff frequently dropped hints on social platforms, fanning the flames and continuously raising the outside world's expectations for Gemini 3.

Interestingly, OpenAI CEO Sam Altman also just posted on the X platform, saying, "Congratulations to Google on successfully launching Gemini 3!! It looks like a great model."

Given Sam Altman's past experience, the riddle - like marketing approach is extremely risky. Once the product's capabilities fall short, the reputation will collapse instantly. But clearly, Google is full of confidence in its product. So, what kind of report card has Gemini 3 Pro actually delivered this time?

The summary is as follows:

The preview version of Gemini 3 Pro natively supports multiple modalities (text, images, videos, audio)

It tops the LMArena leaderboard and leads comprehensively in mainstream tests such as reasoning, multi - modality, and programming

It sets a record in reasoning ability (91.9% in GPQA Diamond, 23.4% in MathArena Apex)

It offers a Deep Think mode (to be opened in the next few weeks)

It has a 1 - million - token context window + 64K output

It launches a new AI IDE: Google Antigravity, and the new model is integrated with tools such as Cursor, GitHub, and JetBrains

A generation example of Gemini 3, from DeepMind CEO Demis Hassabis

Deserving of the 'Pro' Name, Google's Most Powerful AI Model Released at Night

According to Google, Gemini 3 Pro is currently the "smartest and most adaptable model", designed to solve complex real - world problems, especially those tasks that require higher - level reasoning, creativity, strategic planning, and step - by - step improvement.

Its typical application scenarios include applications with autonomous behavior capabilities, advanced programming, understanding of extremely long contexts, cross - modality processing (such as the combination of text, images, and audio), and algorithm development.

The preview version of Gemini 3 Pro ranks first on the LMArena leaderboard with a score of 1501, far exceeding its predecessor in almost all major AI benchmark tests. More importantly, it can not only recognize the content of images but also understand the implied information and contextual relationships within them.

Specifically, in terms of reasoning ability, it achieved a 37.5% doctoral - level reasoning score in the "Humanity’s Last Exam", reached 91.9% in the GPQA Diamond test, and set a new industry record of 23.4% in the MathArena Apex test.

In terms of multi - modality reasoning, it scored 81% in MMMU - Pro, 87.6% in Video - MMMU, and achieved a 72.1% factual accuracy rate in SimpleQA Verified.

This means that Gemini 3 Pro can reliably provide high - quality answers to various complex problems in science, mathematics, etc. Moreover, its responses directly offer real insights, telling you what you need to know, not just what you want to hear.

In addition to the regular mode, Gemini 3 also offers an option called Deep Think.

This Deep Think mode scored 41.0% in the "Humanity’s Last Exam", increased to 93.8% in the GPQA Diamond test, and even achieved an unprecedented score of 45.1% in the ARC - AGI - 2 test.

However, this mode is currently under security evaluation and is expected to be opened to Google AI Ultra subscribers in the next few weeks.

Beyond the test data, the performance of Gemini 3 in real - world application scenarios is even more worthy of attention.

For example, if you dig out that handwritten family recipe book at home, where your grandma wrote the cooking methods in multiple languages, Gemini 3 Pro can recognize these handwritten words and organize them into a shareable recipe book.

Or if you want to learn about a new field, it can process academic papers and long - form video lectures and generate interactive learning cards. Even, it can analyze the video of your ball game and generate a targeted training plan.

This is because Gemini was designed from the start for multi - modality understanding and can integrate various information types such as text, images, videos, audio, and code. Coupled with a context window of up to 1 million tokens and a maximum output of 64K.

It's worth mentioning that the real highlight is in search. This is the first time Gemini has been directly integrated into Google Search on the day of its release. Clearly, Google wants to reconstruct the search experience with this.

It not only significantly enhances the search's ability to understand complex questions and mine information but also can generate dynamic visual interfaces, interactive tools, and simulation systems in real - time based on the query, such as a Three - Body physics simulator or a loan calculator.

Additionally, there are also highlights in the technical architecture of Gemini 3 Pro.

It uses a sparse Mixture of Experts (MoE) model based on Transformer, natively supporting multi - modality inputs such as text, vision, and audio. The core advantage of this architecture is that the model dynamically selects and activates some parameters based on the content of each input token, thereby achieving a balance between computational resource consumption, service cost, and total capacity.

At the hardware level, Gemini 3 Pro is trained using Google's self - developed Tensor Processing Unit (TPU). Compared with CPUs, TPUs are faster in handling the large - scale computations required by large language models and are equipped with large - capacity, high - bandwidth memory, enabling them to handle extremely large models and batch data.

If you are a developer, the changes brought by Gemini 3 will be more direct.

Google's official blog claims that Gemini 3 is currently the most powerful "vibe coding" model - you just need to describe what you want in natural language, and it can generate fully functional interactive applications.

The data speaks for itself: 1487 Elo on the WebDev Arena leaderboard, a 54.2% score on Terminal - Bench 2.0, and a 76.2% score on SWE - bench Verified.

Google also launched a new AI IDE this time: Google Antigravity.

The built - in intelligent agent can independently plan and execute complex end - to - end software tasks and automatically verify the correctness of the code. If you want to create a flight tracking application, the agent can independently plan, write the code, and verify the running effect through the browser. It can even work collaboratively among the editor, terminal, and browser all at once.

In terms of long - term planning ability, Gemini 3 ranks first on the Vending - Bench 2 list.

In practical applications, the newly released experimental feature of Gemini Agent can execute multi - step complex processes from start to finish. If you say, "Organize my inbox," it will prioritize your to - do items and draft email replies for you to confirm.

Or if you say, "Research and book a mid - size SUV for me. The budget is no more than $80 per day. Use the information in my email to arrange next week's trip," Gemini will locate flight information, compare car rental options, and prepare the booking process for you.

You always have the initiative throughout the process, and Gemini will request confirmation before important operations.

Additionally, in Google AI Studio and Vertex AI, the price for using the preview version of Gemini 3 Pro through the Gemini API is: $2 per million input tokens and $12 per million output tokens. You can also use it for free in Google AI Studio, but there are call limits.

Gemini 3 has been integrated into the development tool ecosystems such as Cursor, GitHub, JetBrains, and Replit.

Along with the product release, Google has simultaneously opened multiple access points.

Starting today, the preview version of Gemini 3 is being gradually launched: All users can use it in the Gemini app; Google AI Pro and Ultra subscribers can experience it in the AI mode of search;

Developers can access it through the Gemini API, Google Antigravity, and Gemini CLI; Enterprise users can obtain services through Vertex AI and Gemini Enterprise.

Here Comes ChatGPT's Rival

How 'Competitive' is Gemini 3 in Practical Tests?

Of course, technology companies always hype more than the actual situation. So, we also conducted several hands - on tests.

The first challenge was to have it recreate a complete Game Boy handheld console in a single HTML file, with classic games like "Tetris" and "Pokémon Red/Blue" built - in, and all controls must support both keyboard and touch - screen interactions.

To be honest, I didn't have high expectations for this requirement.

A task that requires simultaneous handling of UI design, game logic, and sound systems would take even professional front - end engineers several days. But the result from Gemini was unexpected: the interactive interface achieved a score of six or seven out of ten, and there were iconic sound effects when the buttons were pressed. As a one - time generated code, it was quite competitive.

Since the retro game console could run, we increased the difficulty.