UC Develops OmniCast to Solve Error Accumulation in Autoregressive Weather Forecasting Models, Boosting Efficiency Up to 20 Times

Selected for NeurIPS 2025

A team from the University of California, Los Angeles, in collaboration with the U.S. Argonne National Laboratory, has proposed a new latent diffusion model called OmniCast, which can be used for high - precision probabilistic subseasonal - to - seasonal (S2S) scale weather forecasting.

Subseasonal - to - seasonal (S2S) scale weather forecasting lies between short - term weather forecasting and long - term climate prediction. It focuses on the weather evolution in the next 2 to 6 weeks, precisely filling the gap in medium - to - long - term meteorological prediction and providing a key basis for agricultural planning, disaster prevention, etc. However, S2S weather prediction has difficulty relying on the rapidly decaying atmospheric initial information (conditions for short - to - medium - term forecasts) and capturing the slowly varying boundary signals that have not fully emerged (conditions for climate prediction). Under the chaotic atmospheric system and complex land - sea - air interactions, the difficulty of forecasting increases significantly.

In recent years, the technological iteration from traditional numerical weather prediction (NWP) systems to deep - learning - driven meteorological prediction methods has played an important role in promoting the development of S2S weather forecasting. However, there are still many challenges in the practical application of S2S. For example, traditional numerical methods mainly rely on solving complex physical equations, which not only have extremely high computational costs but also take a long time. Although data - driven methods achieve "fast, short, and accurate" in short - term forecasts, they calculate the next step based on the previous prediction result through an autoregressive design. In the longer - cycle S2S application, errors will accumulate like a snowball, and at the same time, they will ignore the key slowly varying boundary forcing signals in S2S weather forecasting.

In response to this, a team from the University of California, Los Angeles, in collaboration with the U.S. Argonne National Laboratory, has proposed a new latent diffusion model, OmniCast, for high - precision probabilistic S2S weather forecasting. This model combines a variational autoencoder (VAE) and a Transformer model, using a joint spatio - temporal sampling method. It can significantly alleviate the error accumulation problem of autoregressive methods and at the same time learn the weather dynamic laws beyond the initial conditions. Experiments have proven that this model has reached the optimal level of current methods in terms of accuracy, physical consistency, and probabilistic indicators.

The relevant research, titled "OmniCast: A Masked Latent Diffusion Model for Weather Forecasting Across Time Scales", has been selected for the top AI academic conference NeurIPS 2025.

Research Highlights:

* By considering both spatial and temporal dimensions to generate future weather, OmniCast solves the problem of increasing errors in previous autoregressive - designed models.

* OmniCast can take into account both the atmospheric initial information required for short - term weather forecasting and the slowly varying boundary forcing conditions required for climate prediction.

* OmniCast outperforms existing methods in terms of accuracy, physical consistency, and probabilistic prediction, and its calculation speed is 10 - 20 times faster than that of existing mainstream methods.

Paper URL: https://go.hyper.ai/YANIu

Dataset: Based on the widely used ERA5 basic dataset, classified and adapted to different prediction tasks

To ensure sufficient and reasonable support for OmniCast in training and evaluation, the research used the high - resolution reanalysis dataset ERA5, which is widely used in the meteorological field, as the basic data source. Data pre - processing was carried out for two different forecasting tasks: medium - range weather forecasting and S2S weather forecasting, to adapt to different task requirements as a benchmark test set.

Specifically, the research first extracted 69 meteorological variables from the ERA5 reanalysis dataset, covering two major categories of core indicators:

Surface variables (4 types): 2 - meter air temperature (T2m), 10 - meter U wind speed component (U10), 10 - meter V wind speed component (V10), and mean sea - level pressure (MSLP);

Atmospheric variables (5 types): Geopotential height (Z), air temperature (T), U wind speed component, V wind speed component, and specific humidity (Q). The atmospheric variables cover 13 pressure levels (unit: hPa), namely 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000.

After that, for different forecasting tasks, the research divided the training set, validation set, and test set according to the time range:

Medium - range weather forecasting task: WeatherBench2 (WB2) was used as the benchmark test set. The time range of its training set is from 1979 to 2018, the validation set is from 2019, and the test set is from 2020. The initial conditions use data at 00:00 (UTC) and 12:00 (UTC). The resolution is the native 0.25° (721 x 1440 grid).

S2S weather forecasting task: ChaosBench was used as the benchmark test set. The time range of its training set is from 1979 to 2020, the validation set is from 2021, and the test set is from 2022. The initial conditions use data at 00:00 (UTC). The resolution is 1.40625° (128 x 256 grid).

OmniCast Model: Two - stage design, constructing a new paradigm for S2S weather prediction

The core ability of OmniCast is to avoid the error accumulation problem of traditional autoregressive models as a whole, thus building the ability to meet the requirements of both short - term weather prediction and long - term climate prediction, and providing a usable and reliable tool for the practical application of S2S weather prediction. The core architecture of the SeasonCast model is based on a "two - stage" design. First, data dimensionality reduction is achieved through a VAE, and then temporal sequence generation is achieved through a Transformer with a diffusion head.

The core module in the first stage is a VAE implemented with a UNet architecture. Its core role is "dimensionality reduction" and "reconstruction", compressing high - dimensional original weather data into low - dimensional, continuous latent tokens (latent feature vectors), thereby reducing the computational efficiency problems caused by a large number of variables and high spatial resolution. The number of input/output channels of the VAE is 69, corresponding to 69 meteorological variables. For example, in the S2S weather forecasting task, the VAE encoder can compress the original weather data of size 69 x 128 x 256 into a latent mapping of size 1024 x 8 x 16, with a spatial dimension compression ratio of 16. During generation, the VAE will restore the latent tokens output by the Transformer to weather data of the original dimension (such as air temperature, pressure, etc.).

It is worth noting that the research used a continuous VAE instead of a discrete VAE because the discrete VAE would cause problems such as an excessively high compression ratio and serious information loss due to the large number of weather data variables, thus affecting the performance of the second - stage generative modeling. The compression ratio of the continuous VAE is only 100 times, which can retain more key meteorological information for weather states that may contain hundreds of physical variables.

The core module in the second stage is a masked generative Transformer (as shown in the figure below). It uses the encoder - decoder architecture of a masked autoencoder (MAE). It is the key to achieving "error - free cumulative generation", directly modeling the future full - sequence latent tokens through masked training and diffusion prediction. In terms of structure, the research used a bidirectional Encoder - Decoder architecture, supporting the simultaneous use of initial conditions and generated visible tokens to predict the masked part. The Transformer architecture contains 16 layers of networks, each layer has 16 attention heads, the hidden layer dimension is 1024, and the dropout rate is 0.1.

Schematic diagram of the Transformer backbone network operation

In addition, since latent tokens are continuous vectors, traditional classification heads cannot model their distribution. Therefore, a diffusion model head (implemented by a small MLP) is connected after the Transformer output to predict the distribution of masked latent tokens (as shown in the figure below).

The denoising network eθ predicts the noise ϵ based on zi and xsi

To improve the accuracy of short - term forecasts, the research also introduced an auxiliary mean - squared error loss. Specifically, in short - term weather forecasting, since the chaotic characteristics of the weather system will significantly increase after 10 days, the significance of deterministic prediction will gradually decrease. By adding an additional MLP deterministic head, the MSE loss can be calculated for the first 10 frames of latent tokens. In addition, an exponentially decreasing weight strategy is adopted to highlight the importance of accurate prediction in the early frames.

Results Presentation: Compared with two types of methods, the efficiency is 10 - 20 times higher than that of the benchmark model

To verify the effectiveness and advancement of OmniCast, the researchers compared it with two types of mainstream methods. One is the most advanced deep - learning method, and the other is the numerical method based on traditional physical models. As mentioned above, the experimental verification includes two tasks: medium - range weather forecasting and S2S weather forecasting. The analysis indicators include accuracy, physical consistency, and probability.

First, in the S2S weather prediction task, the researchers compared OmniCast with two deep - learning methods, including PanguWeather (PW) and GraphCast (GC), as well as the numerical model ensemble systems of four countries and regions: UKMO - ENS (UK), NCEP - ENS (USA), CMA - ENS (China), and ECMWF - ENS (Europe).

In terms of accuracy indicators (root - mean - square error (RMSE), absolute bias (ABS BIAS), and multi - scale structural similarity (SSIM)), OmniCast's performance in RMSE and SSIM within the short - term forecast period is slightly inferior to that of other benchmark models as expected. Of course, this is due to the training purpose of OmniCast. However, as the forecast period increases, OmniCast's relative performance will gradually improve. After 10 days, it can reach the optimal performance comparable to that of ECMWF - ENS. As shown in the figure below:

The prediction deterministic performance of different methods within the 1 - 44 - day forecast period under three key variables: solid lines represent deep - learning methods, and dashed lines represent numerical methods

It is worth noting that OmniCast has the smallest bias among all benchmark models and can maintain nearly zero bias in the forecasts of three types of target variables.

In terms of physical consistency, OmniCast's physical consistency is significantly better than that of other deep - learning methods, and in most cases, its performance exceeds that of all benchmark models. This result indicates that OmniCast can effectively retain signals in different frequency ranges, thus ensuring the physical rationality of the forecast. As shown in the figure below:

The physical consistency indicators of different methods within the 1 - 44 - day forecast period under three key variables: solid lines represent deep - learning methods, and dashed lines represent numerical methods

In terms of probabilistic indicators (continuous ranked probability score (CRPS) and discrete skill ratio (SSR), where a value of SSR closer to 1 is better), similar to the accuracy indicators, within the shorter forecast period, OmniCast's performance is slightly inferior to that of ECMWF - ENS, but it will surpass it after 15 days. In general, under various variables and different forecast periods, OmniCast and ECMWF - ENS are the two best - performing methods. As shown in the figure below:

The probabilistic indicators of different methods within the 1 - 44 - day forecast period under three key variables: solid lines represent deep - learning methods, and dashed lines represent numerical methods

In addition to the above experiments, the research team also compared OmniCast with deep - learning methods proposed this year for long - term weather prediction, including ClimaX (based on the Transformer architecture) and Stormer (based on an improved graph neural network). The results show that OmniCast outperforms both in all indicators. In terms of the RMSE indicator, T850 and Z500 are 16.8% and 16.0% lower than those of ClimaX respectively; 11.6% and 10.2% lower than those of Stormer. In terms of the CRPS indicator, they are 20.2% and 17.1% lower than those of ClimaX, and 13.9% and 11.0% lower than those of Stormer. This result proves that OmniCast has significant advantages in long - term weather prediction. Through the combination of the latent diffusion model and the masked generative framework, its ability to model the long - range dependence of weather sequences is better than that of traditional deep - learning architectures. As shown in the figure below:

Comparison of the accuracy of OmniCast with other deep - learning methods

Then, in the medium - range weather prediction task, the research team compared OmniCast with Gencast, a mainstream deep - learning method for probabilistic forecasting, and IFS - ENS, the

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

The efficiency is increased by up to 20 times. The University of California has developed OmniCast to solve the error accumulation problem of autoregressive weather forecasting models.

Dataset: Based on the widely used ERA5 basic dataset, classified and adapted to different prediction tasks

OmniCast Model: Two - stage design, constructing a new paradigm for S2S weather prediction

Results Presentation: Compared with two types of methods, the efficiency is 10 - 20 times higher than that of the benchmark model