Comparing and Contrasting DLWP Backbones on Navier-Stokes and Atmospheric Dynamics
This work addresses the need for standardized evaluation in weather forecasting for researchers and practitioners, though it is incremental as it compares existing methods rather than introducing new ones.
The paper tackled the problem of comparing deep learning weather prediction backbones under controlled conditions, finding that FNO performed best on synthetic Navier-Stokes data, while ConvLSTM and SwinTransformer were suitable for short-to-mid-range forecasts on real-world data, and GraphCast and Spherical FNO showed superior stability for long-range rollouts up to 50 years.
A large number of Deep Learning Weather Prediction (DLWP) architectures -- based on various backbones, including U-Net, Transformer, Graph Neural Network, and Fourier Neural Operator (FNO) -- have demonstrated their potential at forecasting atmospheric states. However, due to differences in training protocols, forecast horizons, and data choices, it remains unclear which (if any) of these methods and architectures are most suitable for weather forecasting and for future model development. Here, we step back and provide a detailed empirical analysis, under controlled conditions, comparing and contrasting the most prominent DLWP models, along with their backbones. We accomplish this by predicting synthetic two-dimensional incompressible Navier-Stokes and real-world global weather dynamics. On synthetic data, we observe favorable performance of FNO, while on the real-world WeatherBench dataset, our results demonstrate the suitability of ConvLSTM and SwinTransformer for short-to-mid-ranged forecasts. For long-ranged weather rollouts of up to 50 years, we observe superior stability and physical soundness in architectures that formulate a spherical data representation, i.e., GraphCast and Spherical FNO. The code is available at https://github.com/amazon-science/dlwp-benchmark.