The Significance of Gemini 3: AI Surpasses "Hallucination Stage", Nears Human Capabilities, Shifting "Human - Machine Collaboration" from "Human Correcting AI" to "Human Guiding AI at Work"

Ethan Mollick believes that the emergence of Gemini 3 means that "agent models" with autonomous action capabilities are on the rise. Although it is not perfect, the mistakes it makes are no longer baseless "hallucinations," but are closer to the biases in judgment or intention understanding of humans.

Ethan Mollick believes that the emergence of Gemini 3 signifies the rise of "agent models" with autonomous action capabilities. Although it is not flawless, the errors it makes are no longer the baseless "hallucinations" but are more similar to the biases in human judgment or intention understanding. As AI capabilities improve, human - machine collaboration is evolving from "humans correcting AI errors" to "humans guiding AI work."

The newly released Gemini 3 model by Google marks a crucial turning point in the field of artificial intelligence.

As previously mentioned by Wall Street News, on Tuesday, November 18th (Eastern Time), Google officially launched its most powerful artificial intelligence (AI) model to date, Gemini 3. On the day of its release, it was immediately launched on Google Search, the Gemini application, and multiple developer platforms, and was put into use in multiple profitable products.

Google executives emphasized at the press conference that Gemini 3 leads in several popular industry rankings for measuring the performance of artificial intelligence models. Demis Hassabis, the CEO of Google's AI research lab DeepMind, said that Gemini 3 is "the world's best multimodal understanding model" and also the company's most powerful intelligent agent and code - generation model to date.

According to a in - depth evaluation by Ethan Mollick, a professor at the Wharton School, the release of Gemini 3 and its supporting tool "Antigravity" demonstrate astonishing "agent" capabilities. Compared with the GPT - 3 model three years ago, AI is no longer just generating text but can write code, build interactive applications, and perform multi - step tasks.

Mollick pointed out that this leap from "description" to "action" means that AI is evolving from a conversational partner to a general - purpose tool that can access computers and complete actual work.

Mollick's conclusion is that we are moving from the "chatbot era" to the "digital colleague era." Although Gemini 3 is not flawless, the errors it makes are no longer the baseless "hallucinations" but are more similar to the biases in human judgment or intention understanding.

This indicates that the model of human - machine collaboration is changing. Humans will no longer correct the basic errors of AI but will provide high - level strategic guidance and directions. This may be the biggest change in the AI field since the release of ChatGPT.

The following is the full text of Ethan Mollick's evaluation article:

Three years, from GPT - 3 to Gemini 3

From chatbots to intelligent agents

Ethan Mollick

November 19, 2025

I've been testing Google's newly released Gemini 3 model. It's excellent. Instead of listing benchmark numbers, I let the AI show how far we've come in these three years - that's what I did.

Nearly three years have passed since the release of ChatGPT. A few days before its launch, I wrote my first article on this Substack about OpenAI's earlier GPT - 3. Then ChatGPT came out, and I immediately wrote another one: "I'm usually cautious about technology predictions, but this time it will change our world faster and more drastically than we expect. It's no longer just about automating repetitive and dangerous jobs. Now, the first jobs to be disrupted by AI may be those more related to analysis, creativity, writing, and communication."

Looking back now, I was right.

(Illustration: A screenshot of my first article after using GPT - 3.5 three years ago)

I could verbally explain the gap between the original ChatGPT and Google's newly released Gemini 3, but it's unnecessary. I threw the above screenshot to Gemini 3 and only wrote one sentence:

Me: "Create something to show how much AI has progressed in these three years."

Gemini: "Sure. Your tweet is from November 2022 - the eve of the modern AI boom. At that time, the biggest 'wow factor' was that AI could write a coherent paragraph or a poem for a silly topic like 'A candy - powered super - light - speed spaceship escaping from otters.'

To show the progress, instead of just describing it in words, I've created an interactive and playable 'Candy - powered super - light - speed spaceship simulator' for you.

In 2022, AI could only describe the engine; in 2025, AI can write the engine code, design the interface, and let you pilot the spaceship yourself."

It really created a mini - game where I had to use candy energy to escape from chasing otters, with poems and various funny updates in between. People reading this blog probably already know that AI can write code, but what's more interesting is what else Gemini 3 can do when you don't just see it as a "chat box."

01 A tool that can write code

Not just "a tool for writing code"

Along with Gemini 3, Google also released Antigravity. Programmers will immediately understand: It's similar to Claude Code and OpenAI Codex, giving you computer access and allowing it to write programs autonomously under guidance.

If you're not a programmer, you might overlook it - I think that's a mistake. "Being able to write code" doesn't mean "being a programmer" but "being able to complete any task that can only be done on a computer." Thus, the essence of these tools has been completely rewritten.

Gemini 3 is extremely good at writing code, which is relevant to you even if you don't consider yourself a "programmer." A fundamental perspective in the AI world is that anything you do on a computer ultimately boils down to code; as long as AI can write code, it can build dashboards, scrape websites, create PPTs, read files... This makes the "intelligent agent that can write code" a general - purpose tool. Antigravity productizes this concept: Give me an "inbox," and I'll assign tasks to intelligent agents. They'll notify me when they need approval or help.

(Illustration: Four intelligent agents running simultaneously, one working and one waiting for my response)

I communicate with them in English, not code; they use code to do work for me. Gemini 3 is good at making plans and knows what to do and when to ask for instructions. For example, I put all my past newsletter manuscripts in a folder and then gave the order:

"Create a beautiful webpage for me to summarize all my predictions about AI, and then search the internet to find out which ones were correct and which ones were wrong."

It read all the files, ran the code, and first presented me with an editable plan - it was the first time it asked me for something, and I was surprised by how accurately it understood. I made a few minor changes and let it proceed.

Then it searched the web, built the website, took over the browser to check the effects, and finally packaged the finished product for me. I gave it modification suggestions as I would to a real - life colleague, and it continued to iterate.

It's not perfect - intelligent agents aren't there yet. I didn't notice any hallucinations, but there were indeed areas that needed my correction. However, those errors were more like "judgment biases or misunderstandings that a human colleague might have" rather than the absurd hallucinations of traditional AI. Importantly, I felt that I could control the AI's decisions because the AI would regularly check and confirm its work, and my operation process was clearly visible. This felt more like managing a teammate rather than communicating with the AI through a chat interface.

02 Doctor - level intelligence?

Antigravity isn't the only surprise. Another shock is that it demonstrates real "judgment."

I often complain that AI benchmarks have become extremely competitive. Gemini 3 leads in most rankings (maybe it can't beat the $200 GPT - 5 Pro yet, but it might turn the tables when the "deep - thinking" version of Gemini 3 comes out). The industry likes to use the slogan "Doctor - level intelligence." I decided to test it for real.

I threw a bunch of old files from a crowdfunding research project ten years ago at it - file names like "project_final_seriously_this_time_done.xls" in the ancient STATA format. I gave only one command:

"Figure out the data structure on your own, clean up the STATA files, and get them ready for new analysis."

It really restored the damaged data and understood the complex environment.

Then I gave it a typical "second - year doctoral thesis" task without any hints:

"Great. Now use these data to write an original thesis. Conduct in - depth research in the field, elevate the topic to the level of entrepreneurship or strategic theory, perform rigorous statistics, and format it for a journal."

It selected the topic, proposed hypotheses, ran statistics, created charts, and formatted the paper - it managed the most difficult part of "balancing topic selection and feasibility" on its own. I just vaguely said "Make it more substantial and improve it," and finally got a 14 - page thesis.

(Illustration: The first two pages of the thesis)

Even more amazing is that it created an index on its own: It used NLP to mathematically compare project descriptions with a large number of descriptions to measure the "uniqueness of crowdfunding ideas." It wrote the code and verified the results by itself.

So, does this count as "doctor - level"?

If you mean "being able to do the work of a qualified graduate student in a top - tier university," the answer is partly "yes." But it also has the common problems of graduate students: There are flaws in statistical methods, the theoretical leaps are too large, and the evidence chain isn't rigorous enough... We've passed the "hallucination" stage and entered a more subtle and "human - like" defect zone. Interestingly, when I gave open - ended suggestions like I would to a student ("Read more crowdfunding literature to establish the method"), it improved significantly - maybe with more guidance, it could approach the "doctor" level.

What is Gemini 3?

It's an extremely excellent "thinking + execution" partner that billions of people around the world can use at will; it's also a mirror reflecting multiple trends such as the unabated development of AI, the rise of intelligent agents, and the need for humans to learn to manage "smart AI."

Three years ago, we were amazed that a machine could write a poem about otters; less than 1000 days later, I'm debating statistical methods with an intelligent agent that has built its own research environment.

The chatbot era is giving way to the "digital colleague" era.

Yes, Gemini 3 is still not perfect and requires a "human manager" who can give instructions and conduct checks. But the "human - in - the - loop" model is evolving from "humans cleaning up after AI" to "humans guiding AI work" - this may be the biggest paradigm shift since the release of ChatGPT.

Easter egg:

I asked Gemini to "create a Substack cover image for me using only code after checking the size first." It searched the internet for the specifications and then drew the image purely through mathematics, completing the whole process.

** Obligatory warning: **

Allowing AI intelligent agents to have computer access is risky - they may move/delete files without asking or even leak documents. It will be much better when the tools are available to non - programmers; for now, be extremely cautious.

This article is from the WeChat official account "Hard AI", author: Ye Zhen, published by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

The significance of Gemini 3: AI has surpassed the "hallucination stage" and is approaching human capabilities. "Human-machine collaboration" will shift from "humans correcting AI" to "humans guiding AI in work."

01

A tool that can write code

Not just "a tool for writing code"

02

Doctor - level intelligence?