AO-PH LGSep 28, 2023

Navigating the Noise: Bringing Clarity to ML Parameterization Design with O(100) Ensembles

Jerry Lin, Sungduk Yu, Liran Peng, Tom Beucler, Eliot Wong-Toi, Zeyuan Hu, Pierre Gentine, Margarita Geleta, Mike Pritchard

arXiv:2309.16177v34.37 citationsh-index: 73Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of designing reliable ML parameterizations for climate modeling, which is incremental in improving sampling methods for online performance evaluation.

The study tackled the problem of uncertainty in machine-learning parameterizations for subgrid processes in climate models by constructing a pipeline for O(100) ensembles of hybrid simulations, revealing that some offline error reductions increase online error while others do not, and showing that large ensembles are needed to detect causal differences.

Machine-learning (ML) parameterizations of subgrid processes (here of turbulence, convection, and radiation) may one day replace conventional parameterizations by emulating high-resolution physics without the cost of explicit simulation. However, uncertainty about the relationship between offline and online performance (i.e., when integrated with a large-scale general circulation model (GCM)) hinders their development. Much of this uncertainty stems from limited sampling of the noisy, emergent effects of upstream ML design decisions on downstream online hybrid simulation. Our work rectifies the sampling issue via the construction of a semi-automated, end-to-end pipeline for $\mathcal{O}(100)$ size ensembles of hybrid simulations, revealing important nuances in how systematic reductions in offline error manifest in changes to online error and online stability. For example, removing dropout and switching from a Mean Squared Error (MSE) to a Mean Absolute Error (MAE) loss both reduce offline error, but they have opposite effects on online error and online stability. Other design decisions, like incorporating memory, converting moisture input from specific humidity to relative humidity, using batch normalization, and training on multiple climates do not come with any such compromises. Finally, we show that ensemble sizes of $\mathcal{O}(100)$ may be necessary to reliably detect causally relevant differences online. By enabling rapid online experimentation at scale, we can empirically settle debates regarding subgrid ML parameterization design that would have otherwise remained unresolved in the noise.

View on arXiv PDF Code

Similar