Google Quietly Adds a Button: Gemini Grows 'Hands and Feet' to Become a Worker as Three Tech Giants Compete to Teach AI to Work

Google has quietly added a new Agent entry: Gemini is starting to "grow hands and feet". It's no longer just responsible for answering questions but is also ready to step in and do tasks for you.

A screenshot has prematurely revealed some important information about this year's Google I/O Conference:

Gemini no longer just wants to chat; it also wants to be an Agent workbench.

Recently, TestingCatalog, which has been long tracking Google's product changes, discovered a new "Agents" entry on Gemini.

It is listed alongside Gems and Files as a primary entry, rather than a hidden option only visible to developers.

This change sends a clear signal:

In the next phase, Gemini will no longer be just a "question-and-answer" chat box, but a workbench where you assign tasks and it executes them.

Looking at Google's product moves in the past six months, the picture becomes clear:

The Agent Designer was fully launched on Gemini Enterprise. NotebookLM added audio generation and video summarization. Agentspace was integrated into Gemini Enterprise as its core engine. Chrome embedded the Gemini sidebar and launched Auto Browse, allowing AI to operate the browser on your behalf.

Now, there is an "Agents" tab alongside the Chat tab in Gemini's chat interface. Users can directly create new tasks, specify goals, attach tools and files in it. The entire interface is more like a task execution workbench than a chat window.

Each step is doing the same thing: bringing Agent capabilities from the developer backend to ordinary users.

Before the I/O Conference even starts, Google has already shown half of its cards.

Chat Is No Longer the Sole Center

If you look at Google's official product descriptions, you'll notice a change in the tone.

When Gemini for Google Workspace was first launched in February 2024, what was the selling point? Chat.

Chatting with AI, helping you write emails, and taking meeting notes. Essentially, it was a chat assistant integrated into Workspace.

Now, look at Google's description of Gemini Enterprise on its official website: "Gemini Enterprise enables teams to discover, create, share, and run AI Agents on a secure platform."

Chatting ability remains one of the cores of Gemini Enterprise, but it is now clearly incorporated into a larger Agent platform framework.

The testing interface of Gemini Enterprise exposed by TestingCatalog

According to the testing interface of Gemini Enterprise exposed by TestingCatalog, Agents have entered the main interaction area: You can switch between Chat and Agent on the left, and the right side integrates the goal, Agent, application connection, and file panels.

In the newly added "Agents" tab, the first things you'll see are the clear entries for "New Task" and "Inbox".

When starting a new task, the interface expands into a powerful task workspace.

Although the core chat view remains, a structured task panel appears on its right side.

This panel clearly defines the various elements of the task, including a clear "goal", the "Agent" to execute the task, the "connected applications" that can be accessed, and the required "files".

In addition, there is a "Require human review" switch on the right sidebar. Users can add a human review node to the task execution process, making the entire interface more like a task execution workspace rather than just an ordinary chat window.

This indicates that when you open Gemini, you're not just going to chat, but to "run a task".

This also confirms that Google's definition of Gemini Enterprise has shifted from a "chat assistant" to a powerful "Agent operating platform".

No Coding Required

You Can Still Create Agents

At the product level, the most crucial piece of the puzzle is the Agent Designer, which was officially launched at the end of 2025.

Google officially defines it as:

An interactive no-code/low-code platform for creating, managing, and publishing single-step and multi-step Agents in Gemini Enterprise.

Let's break down its three key capabilities:

First, Multi-step Agents.

It's not just a single instruction like "Write me an email". It supports multi-step task orchestration, and sub-Agents can be attached under an Agent to form a workflow.

Second, Connect to Real Tools.

Gmail, Google Drive, Jira, GitHub, Notion, SharePoint. These are the connectors that have been launched as listed in the official update log. More connectors like Shopify are also in public preview.

Third, Schedule Execution.

Agents don't require your constant attention. You can set a time for them to run on their own.

Before this, Google had already verified this approach through Agentspace (now integrated into Gemini Enterprise): integrating knowledge search and Agent execution on the same platform.

Employees don't need to care which Agent is running or which data source is being accessed. They can search, ask questions, and run tasks all in one interface.

The appearance of the consumer Agent tab in the leaked interface means that this capability won't be limited to the enterprise version.

Google is likely to roll it out to all users.

A Brain Alone Isn't Enough

You Also Need Hands and Feet

There's a concept that's easy to confuse and needs to be clarified.

An Agent is not the same as a large model.

A large model is more like the "brain" of an Agent, responsible for understanding tasks, reasoning about paths, and making decisions.

But to actually complete a task, you need a layer of "hands and feet", which is the orchestration layer. It's responsible for breaking down steps, invoking tools, connecting context, and handling exceptions during execution.

This is the capability that Google has added this time.

From the public information, the Agent Designer in Gemini Enterprise can be understood as a visual Agent workbench for ordinary enterprise users: you can orchestrate single-step and multi-step tasks without writing code.

In contrast, the Agent Designer in Vertex AI Agent Builder is more focused on the underlying and developer scenarios.

Their capability frameworks are highly similar, but the former has a more user-friendly and low-threshold product interface.

In other words, Google isn't just making the model better at chatting. It's packaging the Agent building capabilities that were originally more developer-oriented into a visual workbench that ordinary users can use.

For consumer users, this means one thing: you don't need to understand APIs or write Python. You can drag and drop to let AI run a workflow for you.

The gap between an "AI that can chat" and an "AI that can do work" is this orchestration layer.

The Tripartite Battle in the Orchestration Layer

Taking a broader view: Google isn't the only one vying for the orchestration layer.

Anthropic and OpenAI have each taken completely different paths. The differences among the three are so great that it's as if they're creating three different products.

Let's start with the concepts.

Google is taking the platform approach.

It embeds Agent capabilities into its existing product matrix: Workspace, Search, NotebookLM, Google Cloud, leveraging its distribution advantage to dominate.

The logic is clear: With the ability to reach over 2 billion users as a moat, once the Agents are created, they can be directly integrated into the tools that users are already using.

Anthropic is taking the tool approach.

Claude Cowork runs on the desktop and directly operates local files, folders, and applications.

Anthropic's official product page states:

It can freely switch between different applications, integrate information from multiple sources, and complete tasks without users having to coordinate each step.

https://www.anthropic.com/product/claude-cowork?utm_source=chatgpt.com

It doesn't build a platform or an ecosystem. Instead, it makes the model itself an Agent.

OpenAI seems to be taking a route that combines both platform and ecosystem:

On one hand, it expands third-party supply and distribution through GPTs and the GPT Store. On the other hand, it migrates from the Assistants API to the Responses API on the API side and uses the Agents SDK to support more complete Agent development.

Now, let's look at the architectural differences.

Google emphasizes the orchestration layer.

Vertex AI Agent Builder provides a complete framework, and the Agent Designer serves as the front end. Multi-Agent collaboration at the enterprise level is the core selling point.

Anthropic focuses less on orchestration and more on capabilities.

The model natively supports tool invocation and environment interaction, leaving the orchestration to developers. Claude's idea is: instead of building a framework for you, it's better to make itself powerful enough so that you can orchestrate however you like.

OpenAI is in the middle.

The Assistants API provides a layer of orchestration abstraction, but it's not as heavy as Google's. The GPT Store is responsible for distribution, but the ecosystem's activity has always been a question mark.

The target users are also completely different.

Google targets enterprise IT departments and ordinary consumer users, with the lowest threshold. Anthropic targets developers and advanced users, with the highest upper limit. OpenAI tries to cover a wide range, targeting both developers and consumers.

Interestingly, the competition among the three is no longer about "whose model is smarter". The ease of use of the orchestration layer and the richness of the ecosystem are the decisive factors for developers to choose a platform.

Who Will Be the First to Get a Billion People Using Agents

This time, the battlefield is not at the model layer.

Google CEO Sundar Pichai once said in an official blog post that Google's competitiveness doesn't lie in just one model version, but in the complete full-stack capabilities behind it:

From research, models, and tools, to product entrances that can reach billions of users, and then to the global cloud network and data center system.

As Agents move from APIs to GUIs, the tipping point for "everyone to use" is approaching.

At this tipping point, the importance of distribution capabilities is rapidly surpassing model performance.

Anthropic's advantage lies in being the first to bring native Agent capabilities like "computer use" to the forefront.

Claude can already interact with the desktop environment through screenshots, mouse, and keyboard. Cowork also clearly emphasizes that it's not a chat assistant, but a system that can switch between local files, folders, and applications and perform multi-step knowledge work on behalf of users.

However, Anthropic's shortcoming is also obvious: it doesn't have a consumer product matrix like Google's. Cowork is still in the research preview stage according to the official statement. Although it's expanding rapidly, there's still a long way to go before it can achieve large-scale default distribution.

With less than a month until the Google I/O Conference, Google is likely to further disclose its Agent strategy.

This is more like a bet between "distribution" and "execution".

Google's bet is that when Agent capabilities are integrated into Gemini, Workspace, and more product entrances, the existing distribution network will quickly complete user education.

Anthropic's bet is that after developers and advanced users experience Agents that can cross applications and operate on the desktop, they'll be willing to pay for execution capabilities first.

The focus of this Agent competition is shifting from "who can chat better" to "who can complete tasks better".

The competition is not only about the execution ability of Agents but also about who can deliver this ability to users the fastest and on the largest scale.

Both Google and Anthropic are betting on Agents, but they're betting on different ways to win.

Reference Materials:

https://www.testingcatalog.com/google-develops-its-own-desktop-agent-to-compete-with-cowork/

This article is from the WeChat official account "New Intelligence Yuan". Author: New Intelligence Yuan, Editor: Yuan Yu. Republished by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。