LGFeb 26

Scaling Laws of Global Weather Models

Yuejiang Yu, Langwen Huang, Alexandru Calotoiu, Torsten Hoefler

arXiv:2602.22962v14.93 citationsh-index: 13

Originality Incremental advance

AI Analysis

This work provides crucial guidance for researchers and practitioners in optimizing the development and training of future data-driven weather forecasting models, potentially leading to more accurate and efficient predictions.

This paper investigates the empirical scaling laws of data-driven global weather models, analyzing the relationship between model performance and model size, dataset size, and compute budget. It found that Aurora shows the strongest data-scaling (10x data reduces validation loss by up to 3.2x), and compute-optimal analysis suggests longer training is better than larger models under fixed compute.

Data-driven models are revolutionizing weather forecasting. To optimize training efficiency and model performance, this paper analyzes empirical scaling laws within this domain. We investigate the relationship between model performance (validation loss) and three key factors: model size ($N$), dataset size ($D$), and compute budget ($C$). Across a range of models, we find that Aurora exhibits the strongest data-scaling behavior: increasing the training dataset by 10x reduces validation loss by up to 3.2x. GraphCast demonstrates the highest parameter efficiency, yet suffers from limited hardware utilization. Our compute-optimal analysis indicates that, under fixed compute budgets, allocating resources to longer training durations yields greater performance gains than increasing model size. Furthermore, we analyze model shape and uncover scaling behaviors that differ fundamentally from those observed in language models: weather forecasting models consistently favor increased width over depth. These findings suggest that future weather models should prioritize wider architectures and larger effective training datasets to maximize predictive performance.

View on arXiv PDF

Similar