LGAO-PHDec 16, 2020

Copula-based synthetic data augmentation for machine-learning emulators

arXiv:2012.09037v338 citations
AI Analysis

This work addresses the problem of data scarcity for machine-learning emulators in weather and climate modeling, offering an incremental improvement through synthetic data generation.

This paper investigates the use of copula-based synthetic data augmentation to improve machine-learning emulators when data is scarce. They applied this method to a toy physical model of downwelling longwave radiation, resulting in a 62% improvement in mean absolute error (from 1.17 to 0.44 W m$^{-2}$).

Can we improve machine-learning (ML) emulators with synthetic data? If data are scarce or expensive to source and a physical model is available, statistically generated data may be useful for augmenting training sets cheaply. Here we explore the use of copula-based models for generating synthetically augmented datasets in weather and climate by testing the method on a toy physical model of downwelling longwave radiation and corresponding neural network emulator. Results show that for copula-augmented datasets, predictions are improved by up to 62 % for the mean absolute error (from 1.17 to 0.44 W m$^{-2}$).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes