HomeArticle

Pre-trained for the first "trillion-level time point", Tsinghua University releases the generative large time-series model Sundial.

新智元2025-06-20 16:28
Tsinghua Sundial Temporal Sequence Large Model: Flow Matching Generation, Pre - trained with Trillions of Data.

[Introduction] The School of Software at Tsinghua University has released a generative large temporal model called Sundial. It bids farewell to the limitations of discretization, processes continuous values without loss, generates predictions based on flow matching, alleviates mode collapse during pre - training, supports non - deterministic probabilistic prediction, and provides dynamic support for the decision - making process.

Recently, a work on a large temporal model from the National Engineering Research Center for Big Data System Software at Tsinghua University has been accepted as an Oral paper at ICML 2025.

Paper link: https://arxiv.org/pdf/2502.00816

Code link: https://github.com/thuml/Sundial

Open - source model: https://huggingface.co/thuml/sundial-base-128m

As soon as the paper was published, this work attracted the attention of the academic and industrial circles.

One week after its release on HuggingFace, Sundial ranked fourth in the Trending list of the time series forecasting section, with a download volume of 6k.

HuggingFace Time Series Forecasting section

The main contributions of this work are as follows:

  • In response to the non - determinism of time series forecasting, a prediction loss function based on flow matching is proposed. It can generate multiple prediction trajectories based on historical sequences and alleviate mode collapse during the pre - training of large temporal models.
  • The first high - quality time series dataset with a scale of trillions of time points is constructed, and a pre - trained model supporting zero - shot prediction is released.
  • Compared with statistical methods and deep models, it does not require specialized fine - tuning and achieves breakthrough results in multiple prediction leaderboards. It also has a millisecond - level inference speed.

Large Temporal Models

Time series reveal the changing patterns of data over time. Time series forecasting plays an important role in many fields such as meteorology, finance, and the Internet of Things.

There are numerous statistical learning, machine learning, and deep learning methods for time series data. However, different methods have their own advantageous ranges:

Although deep learning models are good, they tend to show performance degradation when data is scarce.

Although statistical learning methods are fast, they need to fit each sequence separately and lack generalization ability.

The scale curve of training data and model performance also applies to time series analysis

Recent research aims to build large temporal models: pre - train on large - scale time series data and make predictions on out - of - distribution data (zero - shot prediction).

Since no training is required, its resource consumption is mainly concentrated on inference. Its speed is comparable to that of statistical methods such as ARIMA, and it has stronger generalization ability.

Companies such as Google, Amazon, and Salesforce have successively developed their own large temporal models to provide out - of - the - box prediction capabilities in specific scenarios.

Non - Deterministic Prediction

Currently, deep models in the industry mainly support deterministic prediction: given a historical sequence, a fixed prediction result is produced.

However, there is non - determinism in time series forecasting, and the certainty of the prediction result depends on the sufficiency of information.

Deep learning models the stochastic process of time series changes in a data - driven manner, and the actually observed sequence is also a sample of the above - mentioned stochastic process.

Therefore, time series forecasting not only faces the problem of incomplete information. Even when information is sufficient, there is still a certain degree of uncertainty in future results.

The decision - making process often requires a risk assessment of the prediction results (such as variance, confidence level, etc.). Therefore, probabilistic prediction ability is crucial.

Pre - training Mode Collapse

Probabilistic Prediction Is Not Difficult

The mean - squared loss function can model the prediction distribution of a Gaussian prior, and the Pinball Loss can achieve quantile prediction.

However, endowing large temporal models with probabilistic prediction ability is full of challenges: large - scale time series data often exhibit complex multi - modal distributions — similar historical sequences may have completely different future changes in different domains/samples.

The non - determinism of time series forecasting comes from time series data

When training on the complex and heterogeneous distribution of large - scale time series data, previous models often give “over - smoothed” prediction results (right in the above figure).

Although this result is globally optimal from the perspective of the optimization objective, it does not provide actually useful information.

The author's team calls this phenomenon “mode collapse” in temporal models, which stems from using loss functions with priors, which limits the hypothesis space of the model.

To alleviate mode collapse, Moirai uses a mixture distribution to handle ambiguous prediction situations. However, the mixture distribution still introduces probabilistic priors and is not flexible enough.

Amazon's Chronos discretizes time series and uses cross - entropy to optimize the learning of a multi - modal probability distribution with a weak prior.

However, the cross - entropy loss depends on discretization, which has problems such as loss of accuracy and out - of - vocabulary generalization. It is not native enough.

The differences between Sundial and previous large temporal models are as follows:

(1) Temporal Nativeness: It does not require discretization. It directly encodes continuous time values using Transformer, breaking through the limitations of language modeling.

(2) Distribution Flexibility: It does not introduce distribution priors. It learns flexible data distributions based on generative models, breaking through parametric densities.

In response to the contradiction between nativeness and flexibility, this work delves into native continuous encoding and generative modeling and proposes the first generative large temporal model based on flow matching.

It does not require discretization and can process and predict continuous value sequences. It does not assume a prediction distribution, thus unleashing the learning ability of the model on large - scale time series data.

Temporal Transformer + Flow - Matching Generation

The main body of the Sundial model is an extensible Transformer. It uses techniques such as renormalization, block embedding, and multi - block prediction to adapt to the characteristics of time series data. It also incorporates FlashAttention, KV Cache, etc. for efficiency optimization.

Sundial can be regarded as a kind of

Based on the context representations extracted by Transformer, the researchers propose TimeFlow Loss, which introduces the representation of historical sequences as generation conditions into the flow - matching process.

Flow matching is a cutting - edge technology in generative modeling. By learning the velocity field, it can transform a simple distribution into an arbitrarily complex distribution. By sampling random noise from a simple distribution, samples that follow a complex distribution can be generated.

The proposed loss function does not introduce any probabilistic priors. The model introduces sampling randomness into the training process, expanding the hypothesis space of the prediction distribution, enhancing the fitting ability of the model, and enabling it to handle the distribution heterogeneity of time series data more flexibly.

During inference, by sampling multiple times from a simple distribution, the model can generate multiple prediction trajectories that conform to historical changes. Based on multiple prediction samples, the distribution of the prediction sequence can be constructed, and thus the predicted value, variance, and confidence interval can be estimated.

Sundial can generate possible future scenarios through multiple samplings.

Pre - training with Trillions of Time Points

This work constructs the largest time series dataset in the field, TimeBench, which consists of real data and synthetic data. It covers multiple fields such as meteorology, finance, transportation, energy, and the Internet of Things, includes various sampling frequencies and prediction horizons from hourly to daily, and has a total of trillions (10^12) of time points.

TimeBench consists of a large amount of real data and a small amount of synthetic data, covering multiple application - related fields of time series forecasting

Based on trillions of data, the model is pre - trained with an extended data volume/parameter scale, verifying the “scale law” of generative large temporal models.

Training curves of models with different parameter scales

Results on Prediction Leaderboards

Sundial is tested on multiple leaderboards, covering various input and output lengths and including point prediction and probabilistic prediction scenarios:

GIFT - Eval Leaderboard: The zero - shot prediction ability of Sundial exceeds that of previous models such as Chronos, Moirai, and deep models trained on in - distribution data.

GIFT - Eval is a prediction leaderboard released by Salesforce. It includes 24 datasets, more than 144,000 time series, and 177 million data points, spanning 7 fields, 10 frequencies, and covering multivariate, short - term, and long - term prediction scenarios

FEV Leaderboard: Sundial significantly outperforms statistical methods such as ARIMA and achieves results comparable to Chronos, but only requires 1/35 of the inference time.

GIFT - Eval is a prediction leaderboard released by AutoGluon. It includes 27 datasets. The metrics from left to right are: probabilistic prediction (WQL), point prediction (MASE), and inference time (ms)

Time - Series - Library Leaderboard: Sundial achieves the first - place zero - shot prediction result, and the result continues to improve as the parameter scale increases.