Google takes the lead in L3-level AI. Gemini can work continuously for 40 minutes, and Agent can automatically generate and review hundreds of creative ideas.
Google is one step ahead in implementing the L3 AI defined by OpenAI.
The latest internal test shows that Gemini can run continuously for 40 minutes to handle the same task.
During this period, Gemini can automatically generate over 100 creative ideas based on user input. Then a group of Agents will score and rank these ideas, and produce a structured review report.
In this way, users no longer have to deal with the draft - like outputs of AI. They can simply select from the polished results by the intelligent agents, significantly reducing the time spent on back - and - forth interactions with a single Agent.
In other words, you only need to make decisions, and the exploration and iteration processes will be handled by the Agents.
It is reported that this multi - agent system, which first generates ideas by a generator and then scores them in a competitive way by a jury, makes its debut in user - facing products.
Sure enough, Buffett's judgment is always right - Google is still Google.
Multi - agent Competition System
How can we make agents not just "answer questions" but really "take users' inputs seriously"?
Google's approach is to integrate multi - agent workflow, long - term thinking, and adversarial generation.
In essence, this is trading "time" for "quality".
A single prompt will go through a complete generation - competition - screening process lasting over 40 minutes within the multi - agent system, rather than spitting out an answer all at once.
Specifically, the multi - agent system of Gemini for Enterprise will first receive the theme and evaluation criteria, and generate a large number of initial creative ideas (over 100).
Then, multiple Agents will score and rank these ideas in a competitive manner.
As a result, what is presented to users is not a single answer, but a set of results precipitated through a complete process:
Approximately 100 creative ideas, sorted according to the criteria, accompanied by summaries, details, comments, complete review records, and an independently generated "competition performance report".
In the current preview version, Google has launched two application scenarios based on this competition system:
Idea Generation: After users provide a theme, the system initiates the multi - agent competition process to generate and rank creative ideas related to the theme.
Collaborative Research: Users specify a research theme and provide data. The agents will generate and evaluate creative ideas through the same mechanism, with a greater focus on research - related tasks.
In fact, Google released a research - assisting Agent as early as February this year. However, compared with the capabilities in this internal test, its function scale and performance are not on the same level.
On the one hand, the sustainable duration of a single inference has been directly extended to 40 minutes.
On the other hand, the system can combine adversarial generation during the inference process to produce structured and insightful content.
This not only enables the Agents to handle more complex tasks but also improves the efficiency of human - machine collaboration.
In addition, to help the system confirm requirements and save computing power, the system will provide an overview of "planned evaluation items and creative dimensions" before the formal operation. The task will only start after user confirmation.
In addition to the competition system, Google is also testing a new "Document Dialogue Agent".
It has an independent interface that allows users to upload PDF files of up to 30MB and directly interact with the document content.
The system will integrate up to 30MB of PDF content into the model context, enabling users to extract higher - quality conclusions and information from long documents.
Although these functions are currently integrated into Gemini Enterprise Edition and are still under development, we can still regard Google's attempt as an important exploration towards L3 - level artificial intelligence products.
Google Takes the Lead in L3 AI
Last year, OpenAI proposed a five - level AI classification system to track the development of Artificial General Intelligence (AGI).
According to this system, last year was in the stage of development from L1 (Conversational AI) to L2 (Reasoning AI).
This year, with the rapid development of Agent technology, L3 - level Agent AI has started to take the stage.
The core of L3 lies in "agent capabilities", that is, AI can autonomously execute tasks with user authorization and run continuously for multiple days and adapt to environmental changes.
It can be said that the key to L3 is long - term autonomous operation.
This is also the significance of Gemini's ability to perform multi - agent adversarial generation continuously for 40 minutes:
Through long - term operation, multi - Agent collaboration, and enterprise - level computing power support, it has translated the ability to "continuously work on a single task for dozens of minutes and iterate for optimization" into a usable product, getting closer to the definition of L3.
Some netizens even speculate that at this development speed, there may be an Agent that can work continuously for 3 hours next year.
Another netizen responded that Anthropic's timeline is for an Agent to work continuously for 8 hours in 2026.
By then, all humans need to do is design the questions and evaluation criteria, and the rest of the tasks can be completed autonomously by the Agents.
With the further development of collaborative research, Gemini may even reach the threshold of L4 (Innovator).
Let's wait and see.
Reference Links:
[1]https://www.testingcatalog.com/google-to-enable-research-automation-on-gemini-enterprise/
[2]https://x.com/testingcatalog/status/1990177061852328329
This article is from the WeChat official account "QbitAI", author: henry. It is published by 36Kr with authorization.