Google's Late - Night Move: Robots Learn to Read Dashboards and Work, 300% Success Rate Surge

Joining hands with Boston Dynamics, Google equips robots with a brain that can read meters.

On April 15th, ZDXX reported that late last night, Google launched Gemini Robotics-ER 1.6.

In September last year, Google released Gemini Robotics-ER 1.5. After more than half a year, Google's robot model has finally undergone a major upgrade.

Gemini Robotics-ER 1.6 enables robots to understand the surrounding environment with unprecedented precision and has been upgraded in multiple key reasoning abilities, including visual and spatial understanding, task planning, and task completion judgment. It can serve as a high-level reasoning model for robots, natively invoking Google Search, VLA, and other third-party custom functions to autonomously complete complex work tasks.

Google mentioned that compared with Gemini Robotics-ER 1.5 and Gemini 3.0 Flash, Gemini Robotics-ER 1.6 has significantly improved in spatial and physical reasoning abilities, especially in aspects such as point location, counting, and task success judgment.

Meanwhile, Gemini Robotics-ER 1.6 has also added a new ability - instrument reading, which enables robots to read complex devices such as pressure gauges and liquid level observation windows. This function was developed in cooperation between Google and Boston Dynamics and is particularly suitable for high-precision industrial tasks.

As shown in the figure, in the pointing and counting tasks, the success rate of Gemini Robotics-ER 1.6 is 80%; in the single-view successful detection task, its success rate is 90%; in the multi-view successful detection task, its success rate is 84%; in the instrument reading task combined with Agentic Vision, its success rate reaches 93%, which is a 300% surge compared with the 23% success rate of Gemini Robotics-ER 1.5.

From now on, developers can use Gemini Robotics-ER 1.6 through the Gemini API and Google AI Studio.

01. Upgrade point location and multi-view reasoning abilities,

Enhance the autonomy of robots in performing tasks

Point location is a basic ability of the embodied reasoning model, which allows the model to perform different types of reasoning tasks, including spatial reasoning, relational logic, motion reasoning, and constraint understanding.

Gemini Robotics-ER 1.6 can use points as intermediate reasoning steps to complete more complex tasks. It can first count the objects in the image through points or improve the accuracy of size or distance estimation by identifying key positions and combining mathematical calculations.

As shown in the figure, Gemini Robotics-ER 1.6 knows when to point at the target and when not to point randomly. It can correctly identify that there are 2 hammers, 1 pair of scissors, 1 paintbrush, 6 pairs of pliers, and a set of gardening tools in the picture. It will not mislabel the wheelbarrow and drill that do not exist in the picture.

In contrast, Gemini Robotics-ER 1.5 cannot correctly identify the number of hammers and paintbrushes, completely misses the scissors, and even has hallucinations, pointing out a non-existent wheelbarrow. The positioning accuracy of the pliers is also poor.

The performance of Gemini 3.0 Flash is already relatively close to that of Gemini Robotics-ER 1.6, but it is still not ideal in handling the pliers.

Gemini Robotics-ER 1.6 has also improved its multi-view reasoning ability, enabling it to better understand the images from multiple cameras and the relationships between them. Even in an environment with dynamic changes or occlusions, it can maintain a high level of judgment ability.

Gemini Robotics-ER 1.6 can comprehensively integrate the information from multiple camera views to determine whether the task of "putting the blue pen into the black pen holder" has been completed.

This kind of task success judgment (Success Detection) is a key part of the robot's autonomy, as it determines whether the robot should try again or proceed to the next step during the task execution process.

02. The success rate of instrument recognition reaches 93%,

Robots can perform equipment inspection tasks

Industrial plants are filled with various precision instruments such as thermometers, pressure gauges, and chemical liquid level observation windows, which require long-term continuous monitoring. To solve these complex problems in real industrial scenarios, robots must learn to recognize instrument readings.

Gemini Robotics-ER 1.6 enables robots to read a variety of instruments, including circular pressure gauges, vertical liquid level gauges, and modern digital reading devices.

Reading instruments is not a simple recognition task but a complex visual reasoning process. The system must accurately perceive various visual elements, such as pointers, liquid levels, container boundaries, and scale lines, and understand the relationships between them.

Taking the liquid level observation window as an example, the model needs to estimate how much the liquid actually fills in combination with the distortion caused by the camera shooting angle. For pressure gauges, the system also needs to read and understand the units marked in words; some dials even have multiple pointers corresponding to different decimal places, and the correct reading can only be obtained after comprehensive consideration.

Relying on instrument reading recognition and the upgraded task reasoning ability, Boston Dynamics' Spot quadruped robot can achieve fully autonomous inspection, independently perceive, understand, and respond to various industrial real-world challenges.

The reason why Gemini Robotics-ER 1.6 can achieve high-precision instrument reading is that it uses Agentic Vision technology, which combines visual reasoning with code execution.

Specifically, the model will first take a series of intermediate steps: for example, zoom in on the image to observe the details of the instrument more clearly; then estimate the ratio and range through point marking and code execution, and finally obtain an accurate reading and understand its meaning in combination with world knowledge.

In the instrument reading task, the success rates of the four models increase in turn: the success rate of Gemini Robotics-ER 1.5 is 23%; the success rate of Gemini 3.0 Flash is 67%; the success rate of Gemini Robotics-ER 1.6 is 86%; the success rate of Gemini Robotics-ER 1.6 (with agentic vision enabled) is 93%.

03. Conclusion: For robots to enter real-world applications,

They need to have sufficient safety

In the current situation where robots are being deployed on a large scale in civilian and industrial scenarios, safety has become as important as intelligence and autonomy and has become the core threshold restricting the implementation of embodied intelligence.

Google said that Gemini Robotics-ER 1.6 has not only comprehensively advanced in core capabilities such as environmental perception, spatial reasoning, and industrial instrument recognition but has also completed a systematic upgrade of safety capabilities. It is currently the robot-specific model with the best safety performance.

In the adversarial spatial reasoning task, Gemini Robotics-ER 1.6 complies with the Gemini safety policy better than all previous versions. At the same time, Gemini Robotics-ER 1.6 has also significantly improved in complying with physical safety constraints.

For example, in tasks involving point output, it can more safely judge which objects can be grasped by the mechanical gripper and which cannot be touched, so as to meet the gripper limitations or material constraints, such as "do not handle liquids" and "do not grasp objects exceeding 20 kilograms."

Google also tested the model's ability to recognize potential safety hazards in text and video scenarios. The test basis comes from real-world accident reports. In these tasks, Gemini Robotics-ER 1.6 has also improved compared with Gemini 3.0 Flash: a 6% improvement in the text scenario and a 10% improvement in the video scenario, indicating that it is more accurate in recognizing potential injury risks.

For embodied intelligence, what really determines whether a robot can leave the laboratory and enter large-scale real-world scenarios is not only a stronger "brain" but also the safety and reliability behind each perception, judgment, and action.

This article is from the WeChat official account “ZDXX” (ID: zhidxcom), author: Xu Lisi, editor: Mo Ying. Republished by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Google's late-night move: Robots learn to read dashboards and work, with a 300% surge in success rate

01.

Upgrade point location and multi-view reasoning abilities,

Enhance the autonomy of robots in performing tasks

02.

The success rate of instrument recognition reaches 93%,

Robots can perform equipment inspection tasks

03.

Conclusion: For robots to enter real-world applications,

They need to have sufficient safety