AO-PHLGOct 6, 2025

Benchmarking atmospheric circulation variability in an AI emulator, ACE2, and a hybrid model, NeuralGCM

arXiv:2510.04466v13 citationsh-index: 29Geophys Res Lett
Originality Synthesis-oriented
AI Analysis

This work provides benchmarking tools for AI model development in atmospheric dynamics, which is incremental but essential for applications like climate extrapolation.

The study evaluated an AI emulator (ACE2-ERA5) and a hybrid model (NeuralGCM) on atmospheric variability metrics, finding they captured large-scale tropical waves and extratropical interactions but struggled with quasi-biennial oscillation (~28 months) and Southern annular mode propagation (~150 days).

Physics-based atmosphere-land models with prescribed sea surface temperature have notable successes but also biases in their ability to represent atmospheric variability compared to observations. Recently, AI emulators and hybrid models have emerged with the potential to overcome these biases, but still require systematic evaluation against metrics grounded in fundamental atmospheric dynamics. Here, we evaluate the representation of four atmospheric variability benchmarking metrics in a fully data-driven AI emulator (ACE2-ERA5) and hybrid model (NeuralGCM). The hybrid model and emulator can capture the spectra of large-scale tropical waves and extratropical eddy-mean flow interactions, including critical levels. However, both struggle to capture the timescales associated with quasi-biennial oscillation (QBO, $\sim 28$ months) and Southern annular mode propagation ($\sim 150$ days). These dynamical metrics serve as an initial benchmarking tool to inform AI model development and understand their limitations, which may be essential for out-of-distribution applications (e.g., extrapolating to unseen climates).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes