LGApr 30, 2024

Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction

Jared D. Willard, Peter Harrington, Shashank Subramanian, Ankur Mahesh, Travis A. O'Brien, William D. Collins

arXiv:2404.19630v115.720 citationsh-index: 18Has CodeArtif Intell Earth Syst

Originality Incremental advance

AI Analysis

This work addresses the need for reproducible and efficient training methods in weather prediction, though it is incremental as it builds on existing transformer architectures.

The study tackled the problem of inconsistent training recipes in deep learning for weather prediction by showing that a minimally modified SwinV2 transformer trained on ERA5 data achieves superior forecast skill compared to IFS, with ablations on loss functions and model size.

The rapid rise of deep learning (DL) in numerical weather prediction (NWP) has led to a proliferation of models which forecast atmospheric variables with comparable or superior skill than traditional physics-based NWP. However, among these leading DL models, there is a wide variance in both the training settings and architecture used. Further, the lack of thorough ablation studies makes it hard to discern which components are most critical to success. In this work, we show that it is possible to attain high forecast skill even with relatively off-the-shelf architectures, simple training procedures, and moderate compute budgets. Specifically, we train a minimally modified SwinV2 transformer on ERA5 data, and find that it attains superior forecast skill when compared against IFS. We present some ablations on key aspects of the training pipeline, exploring different loss functions, model sizes and depths, and multi-step fine-tuning to investigate their effect. We also examine the model performance with metrics beyond the typical ACC and RMSE, and investigate how the performance scales with model size.

View on arXiv PDF Code

Similar