Global Attention with Linear Complexity for Exascale Generative Data Assimilation in Earth System Prediction
It addresses the scalability bottleneck in data assimilation for Earth system prediction, enabling unprecedented resolution and uncertainty quantification for extreme weather events.
This work introduces a generative data assimilation framework that reformulates the problem as Bayesian posterior sampling, achieving 1.6 ExaFLOP sustained performance on 32,768 GPUs and scaling to 20 billion spatiotemporal tokens for km-scale global modeling over 177k temporal frames.
Accurate weather and climate prediction relies on data assimilation (DA), which estimates the Earth system state by integrating observations with models. While exascale computing has significantly advanced earth simulation, scalable and accurate inference of the Earth system state remains a fundamental bottleneck, limiting uncertainty quantification and prediction of extreme events. We introduce a unified one-stage generative DA framework that reformulates assimilation as Bayesian posterior sampling, replacing the conventional forecast-update cycle with compute-dense, GPU-efficient inference. At the core is STORM, a novel spatiotemporal transformer with a global attention linear-complexity scaling algorithm that breaks the quadratic attention barrier. On 32,768 GPUs of the Frontier supercomputer, our method achieves 63% strong scaling efficiency and 1.6 ExaFLOP sustained performance. We further scale to 20 billion spatiotemporal tokens, enabling km-scale global modeling over 177k temporal frames, regimes previously unreachable, establishing a new paradigm for Earth system prediction.