StartseiteArtikel

Der Weltraum-AI-Intelligenzagenten ist geboren, Google Earth bekommt einen Cheat-Code, und binnen einer Nacht werden 2 Milliarden Menschen vor Überschwemmungen gewarnt.

新智元2025-11-05 19:38
Google Earth AI kombiniert mit Gemini, um weltweite Inferenz zu ermöglichen, und unterstützt Katastrophenwarnungen und Bevölkerungsanalysen.

[Introduction] The Earth-level intelligent agent is here! Google has integrated more than a decade of world modeling experience and the reasoning ability of Gemini into Earth AI.

When it comes to geographical AI, Google leads the way!

Google has achieved complex geospatial reasoning on a global scale for the first time, turning the Earth into a "computable object".

Based on decades of experience in world modeling and combined with the advanced reasoning ability of Gemini, Google has significantly upgraded Earth AI -

It covers everything from environmental monitoring to disaster response.

Google Earth AI is a series of geospatial AI models and datasets, including a geospatial reasoning model powered by Gemini, which can automatically connect different Earth AI models - weather forecasting, population maps, and satellite images - and answer various questions.

Previously, Google launched the Google Maps tool in the Gemini API.

Google Earth gets a boost, not just with Gemini as a guide

While the capabilities of individual AIs are becoming increasingly powerful, real-world problems often require the integration of cross-disciplinary knowledge.

Where might a typhoon make landfall? Which communities are the most vulnerable? How should we prevent typhoon disasters?

To answer such questions, it is necessary to comprehensively process image, population, and environmental data and conduct integrated reasoning.

This year, Google's Earth AI was developed for this purpose.

But this time, by combining a powerful foundation model with the spatial reasoning intelligent agent of Gemini, Google has achieved the ability to reason about complex real-world problems on an "Earth scale" for the first time.

Based on real-world data, the foundation model provides in-depth knowledge of the Earth.

The intelligent agent acts as a wise commander: it breaks down complex problems into multi-step solutions, calls the foundation model, queries a vast database, and uses geospatial tools to execute the plan, and finally integrates the results of each stage to form an overall solution.

Today, Google has launched significant new innovative achievements in Earth AI:

  • Released a new generation of image and population foundation models and published technical details and evaluation reports
  • Proposed a spatial reasoning intelligent agent

Research shows that with geospatial reasoning, analysts can not only predict storm paths but also identify the most vulnerable communities and high-risk infrastructure at once.

For example, the non-profit organization GiveDirectly has improved disaster relief efficiency by integrating flood data and population density information to precisely locate affected groups in need of assistance.

Google said that the integrated dialogue function, which has been in pilot use since last year, can help users discover targets and patterns in satellite images. For example, users can simply input "find algal blooms" to let Google Earth monitor the status of drinking water sources.

What makes this research exciting is the important AI applications it is promoting:

Precision community health interventions carried out by Boston Children's Hospital,

GiveDirectly's rapid identification of groups most in need of assistance during disasters,

The World Health Organization's African Region's action to predict cholera outbreak risk areas

Airbus's prevention of power outages for customers by detecting vegetation encroachment on power transmission lines,

And the University of Chicago's use of the model to predict the arrival of the Indian monsoon season and its cooperation with the Ministry of Agriculture and Farmers' Welfare of India to send precise forecasts to 38 million farmers.

Google also has practical AI applications -

During the 2025 California wildfires, Google sent crisis alerts to 15 million people in the Los Angeles area and displayed the locations of available shelters in real-time on the map.

Behind these achievements is Google's profound accumulation in the field of geospatial AI - its models are used not only for flood and wildfire warnings but also cover many scenarios such as cyclones and air quality.

Google achieves reasoning on an Earth scale for the first time

In the newly released technical paper, Google publicly presented the "Remote Sensing Foundation Model" and the "Population Dynamics Foundation Model" for the first time and demonstrated the powerful capabilities of the geospatial reasoning intelligent agent:

🌍 Intelligent Geographical Reasoning: The intelligent agent based on Gemini can coordinate multi-dimensional Earth AI models to answer complex cross-modal questions.

🌍 Upgraded In - depth Insights: Google Earth integrates Earth AI models and Gemini functions, allowing users to intelligently search for targets in satellite images using natural language.

🌍 Open Access in the Cloud: Through the Google Cloud platform, the core Earth AI models (image, population, environment) are directly open to trusted testers.

Link: https://arxiv.org/abs/2510.18318

Earth AI is built on multi - source, multi - modal geospatial data and tools (on the left in the figure below).

Then, the sub - intelligent agents and models in the three vertical fields of image, population, and environment process this data (in the middle of the figure below).

Finally, the Earth AI geospatial reasoning intelligent agent (on the right in the figure below) conducts global integration to achieve comprehensive geospatial analysis and insight generation.

Three Foundation Models: Image, Population, Environment

The remote sensing foundation model simplifies three core capabilities and accelerates satellite image analysis.

First, synthetic annotations and data obtained from the network form the training dataset for the core components.

The trained visual - language model and open - vocabulary detection model can be directly applied to classification, detection, and retrieval tasks; through fine - tuning, the visual Transformer encoder can improve the performance of downstream specific tasks.

The training and application process of the remote sensing foundation model, with the core being the visual - language model, open - vocabulary object detection model, and pre - trained ViT encoder

Users can use natural language to make queries and get fast and accurate responses, such as "find flooded roads in the image after heavy rain".

Based on the joint training of a large number of high - resolution aerial images and text descriptions, the remote sensing foundation model has achieved breakthrough performance in multiple public Earth observation benchmark tests -

The average improvement in text - based image retrieval tasks is over 16%, and the zero - shot detection accuracy for new categories of objects is more than twice the baseline.

To deeply understand the complex interaction between human activities and the geographical environment, it is necessary to study fields such as "Mobility AI" and the "Population Dynamics Foundation Model".

In this study, the population dynamics foundation model introduced two key innovations:

1. A global unified embedding representation covering 17 countries;

2. Dynamically updated monthly embeddings of human activities.

These new features are particularly important for time - sensitive predictions because they can more accurately capture the changing rhythm of human behavior.

Training is divided into two stages -

The first stage is offline training: By integrating multiple geospatial data (map data, search trends, population flow activity, and environmental conditions), a compact regional embedding representation is generated.

The second stage uses pre - trained embeddings for dynamic fine - tuning of downstream tasks, enabling functions such as spatial interpolation, extrapolation, super - resolution reconstruction, and trend prediction of local statistical data.

Two - stage framework of the population dynamics foundation model

Google evaluated the "Population Dynamics Foundation Model" using data from 17 countries. The results show that in predicting four indicators - population density, tree cover, nighttime light intensity, and altitude - the R² scores (ranging from 0 to 1, with higher values being better) of each country are excellent.

Google visualized the similarity of each dimension of the population dynamics foundation embedding vectors according to US zip codes. The patterns in different dimensions reflect the diverse characteristics of the US population.

Independent research results have also verified the powerful performance of this model.

For example, when researchers at the University of Oxford predicted the spread of dengue fever in Brazil, they introduced the embedding representation provided by this model, which significantly improved the accuracy of long - term predictions - the R² value (an indicator measuring the model's ability to explain the actual incidence rate) for 12 months increased from 0.456 to 0.656.

Previously, Google had achieved technological breakthroughs in medium - range weather forecasting, monsoon onset prediction, air quality monitoring, and river flood warning.

Recently, it has upgraded the environmental model to support global nowcasting of precipitation and extended the coverage of major river flood warnings to 2 billion people.

Geospatial Reasoning Intelligent Agent: Unleashing Earth's Potential

Solving real - world problems requires integrating insights from multiple professional models.

The future of geospatial AI lies not in isolated single models but in an integrated multi - modal ecosystem coordinated by advanced AI.

Google's newly launched Gemini - powered geospatial reasoning intelligent agent can intelligently coordinate the different capabilities of Earth AI.

Research has confirmed that the integration of multiple models leads to stronger prediction capabilities.

The ultimate goal of Earth AI is to help users answer complex real - world problems that require multi - dimensional reasoning across models and data sources.

Such queries can be divided into three levels of complexity:

  1. Descriptive and Retrieval Queries: Fact - finding, such as "What was the highest temperature record in New York in August 2020?"
  2. Analytical and Associative Queries: Revealing pattern associations between different data sources, such as "How many hospitals in Louisiana were located in areas severely affected by Hurricane Katrina when it made landfall?"
  3. Prediction or Inference Queries: Information prediction, such as "Which cities in India will have the highest flood impact risk for vulnerable populations by November this year?"

To solve queries in these three complexity categories, Google has specifically designed a "Geospatial Reasoning Intelligent Agent".

To iteratively optimize responses, the intelligent agent continuously repeats the cycle of "thinking and planning → data operation/model reasoning/training → reflection and correction" until it generates a final answer based on reliable evidence.

Operating framework of the geospatial reasoning intelligent agent

For example, when users need to identify specific vulnerable populations threatened by a storm, the intelligent agent achieves precise analysis through the following transparent reasoning steps:

  1. Environmental risk modeling: Call the environmental model to precisely delineate the geographical area threatened by hurricane -