LGAICEMay 12, 2024

CaFA: Global Weather Forecasting with Factorized Attention on Sphere

CMU
arXiv:2405.07395v16 citationsh-index: 43
Originality Incremental advance
AI Analysis

This work addresses computational bottlenecks in weather forecasting for meteorologists and climate scientists, offering an incremental improvement in efficiency for Transformer-based models.

The paper tackles the computational challenge of applying Transformer models to global weather forecasting by proposing a factorized-attention-based model tailored for spherical geometries, achieving deterministic forecasting accuracy on par with state-of-the-art data-driven models at 1.5° resolution and 0-7 days' lead time while improving accuracy-efficiency trade-offs.

Accurate weather forecasting is crucial in various sectors, impacting decision-making processes and societal events. Data-driven approaches based on machine learning models have recently emerged as a promising alternative to numerical weather prediction models given their potential to capture physics of different scales from historical data and the significantly lower computational cost during the prediction stage. Renowned for its state-of-the-art performance across diverse domains, the Transformer model has also gained popularity in machine learning weather prediction. Yet applying Transformer architectures to weather forecasting, particularly on a global scale is computationally challenging due to the quadratic complexity of attention and the quadratic increase in spatial points as resolution increases. In this work, we propose a factorized-attention-based model tailored for spherical geometries to mitigate this issue. More specifically, it utilizes multi-dimensional factorized kernels that convolve over different axes where the computational complexity of the kernel is only quadratic to the axial resolution instead of overall resolution. The deterministic forecasting accuracy of the proposed model on $1.5^\circ$ and 0-7 days' lead time is on par with state-of-the-art purely data-driven machine learning weather prediction models. We also showcase the proposed model holds great potential to push forward the Pareto front of accuracy-efficiency for Transformer weather models, where it can achieve better accuracy with less computational cost compared to Transformer based models with standard attention.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes