HomeArticle

Stop arguing. Let me do it: Gemini 3 generates everything.

果壳2025-11-19 08:06
What can create a gap between people is none other than imagination.

Finally, after keeping everyone in suspense for a long time, Gemini 3 was launched last night. With its almost terrifying capabilities, it outperforms all major models.

It can generate 3D models, build websites, and even create an open - world game with just one sentence...

Now, by accessing Google AI Studio, you can directly experience the preview version of Gemini 3 Pro. As for the more consumer - oriented Gemini website and app, they will also be launched soon.

I'm not targeting anyone. I'm just saying that all of you here...

Gemini 3 Pro has shown its outstanding results. It not only completely outshines its predecessor, Gemini 2.5 Pro, but also comprehensively outperforms Claude Sonnet 4.5 and GPT - 5.1 in all aspects except for "Solving Real GitHub Problems (SWE - Bench Verified)."

It's like in a class where there are a few top students who are good at certain subjects, and then there comes a well - rounded overachiever who gets full marks in every subject. Isn't it annoying? Isn't it scary?

There are several items in the overachiever's report card that deserve special attention.

In the ARC - AGI - 2 test, Gemini 3 Pro leads the second - place Claude Sonnet 4.5 by a huge margin with a score of 31.1%. This is a high - difficulty test used to evaluate an AI's abstract reasoning ability and is considered an important standard for measuring the level of general artificial intelligence.

AIME 2025 and MathArena Apex represent the ability to solve mathematical problems. Among them, Gemini 3 Pro scored 23.4% in MathArena Apex. Don't be deceived by the seemingly low score. Its competitors scored less than 2%, probably because they couldn't even understand the questions.

The ScreenSpot - Pro and Vending - Bench 2 tests are quite interesting. The former is used to evaluate whether an AI can understand and operate a UI interface like a human, while the latter tests an AI's ability to execute tasks in complex long - term and cross - scenario situations.

To put it simply, Gemini has become what Siri has always wanted to be but couldn't.

Suppose your unlucky boss (I'm talking about yours) suddenly changes the meeting to the evening, and you're worried that you won't be able to make it to your daughter's performance after the meeting. When you ask the AI, it will access various data on your phone, such as the end time of the meeting, the time of the performance in your calendar, and the traffic conditions during that period, and then determine whether you can catch up in time.

While the others have been in the "preparation" stage for two years, Google has already delivered the product.

Under the Google ecosystem, by enabling the Gemini Agent mode and authorizing Gemini, it can access data from all your Google devices and help you achieve the above - mentioned scenarios.

For example, if you tell Gemini, "Based on the information in my emails, help me book a mid - size SUV for my next week's trip, with a daily rental fee of no more than $80." Then when you get off the plane, you can just pick up the car.

Moreover, it also scored the current highest score of 37.5% in the "Humanity's Last Exam," known as "the last closed - book exam for humanity," far exceeding the 26.5% of the second - place GPT - 5.1.

That is to say, Gemini 3 Pro is currently the model closest to a "human generalist."

It doesn't end here. When the Gemini 3 Deep Think (deep thinking mode) is enabled, its score in the "Humanity's Last Exam" can reach 41% without using tools. In addition, when facing complex scientific problems that require strict logic and professional knowledge (GPQA Diamond), Gemini 3 Deep Think scored a high score of 93.8%.

In the ARC - AGI - 2 mentioned above, Gemini 3 Deep Think even scored an astonishing 45.1%, completely overwhelming Gemini 2.5 Pro, which only got 4.9%.

Designers are in danger

In the past, if you wanted to develop an app or a website, designers had to first draw the UI and various materials, and then programmers would use code to call them, and finally a interactive product could be released.

Now, with just one sentence, Gemini can create high - quality interactive SVGs. For example, the "electric fan" that is very popular on X is not only beautifully drawn but also animated and interactive, ready for direct use.

In addition, some netizens asked Gemini to draw a "plumber in a game."

A five - cylinder engine...

I also tried asking Gemini to draw a light bulb and added an operable switch to it. It completed the task in just 35 seconds.

And I asked it to recreate my cat. To be honest, it really looks like.

Interactive SVGs are not just for fun. They have greater significance and ambition.

Google said that based on the powerful reasoning and multi - modal capabilities of Gemini 3, they will launch a new feature called "Generative UI" in the future Gemini App.

To put it simply, in the future, Gemini apps can directly use interactive UI to answer your questions and respond to your instructions, rather than following the traditional "question - answer" interaction mode of large - language models.

For example, according to Google's official example, if you ask Gemini to plan a 3 - day trip to Rome next summer, it will generate dynamic content similar to a magazine. You can not only browse it but also interact with the elements inside.

This multi - modal ability is probably Google's confidence in saying that it can "build anything."

(Some) programmers are in even greater danger...

In a sense, drawing with SVG is also a form of programming. When it comes to programming, that's where Gemini really shines.

According to tests by netizens on X, Gemini 3 Pro ranks first by a large margin in several projects in DesignArena.

Now, with just a relatively short description, you can ask Gemini 3 to create a "macOS operating system." After clicking "run," it will go through the "booting" process. Even more impressively, you can surf the Internet and run the terminal in the "macOS" it created...

This is just basic. Some netizens asked Gemini 3 to create a version of "Minecraft," and it did a great job.

I also conducted my own test. I asked Gemini to build a personal website with four pages: the home page, personal introduction, works, and contact information. I required a modern, minimalist, and high - end style.

Gemini completed the task in just one and a half minutes. The navigation bar uses the same frosted - glass style as Apple's, and the buttons and input boxes inside are all functional, not just for decoration.

However, the currently generated website is just average.

So I told it, "I want the style of a top - notch global design studio website, with more daring colors and layouts."

Forty - five seconds later, the result delivered by Gemini 3 Pro amazed me.

As for recreating a certain design, it's even easier.

I've also seen something even more amazing. A netizen on X asked Gemini 3 to create a 3D Lego editor, and it achieved the user interface, code construction, and all the required functions in one go.

Within an hour after the launch of Gemini 3 Pro, coding - assistance software like Cursor provided support immediately.

For professional developers, Google has also released Antigravity, a platform that seems to be an IDE but is actually a coding - assistance tool. It truly makes AI a "productivity assistant" for programmers. It can independently track development progress, create task lists or PPTs, write code, and then verify the code's effectiveness in the browser. It can even summarize and improve itself.

In this process, Antigravity will also learn your coding style and various development preferences.

In this context, perhaps benchmark scores are not that important compared to truly breaking the barrier between "thinking" and "doing." Take coding for example. The technical barriers have been largely removed. Writing front - end code and adjusting frameworks may not be as crucial anymore. The only thing that can set people apart is imagination.

As Google said, Gemini 1 had multi - modal capabilities from the start, Gemini 2 had stronger reasoning abilities, enabling AI Agents to think, program, and act independently. Gemini 3 can generate various output formats that users want through the new generative UI interface. It has made steady progress all the way. To borrow a popular internet phrase, everyone has high hopes for you, and you've really lived up to those expectations.

Image sources: Google and the author