Latent-Constrained Conditional VAEs for Augmenting Large-Scale Climate Ensembles
This work addresses the computational expense of large climate-model ensembles for climate scientists, though it is incremental as it builds on existing CVAE methods with specific constraints.
The paper tackled the problem of generating additional statistically consistent climate realizations from limited ensemble runs by introducing a latent-constrained conditional variational autoencoder (LC-CVAE) that enforces cross-realization homogeneity at anchor locations, resulting in improved generalization and insights into trade-offs between spatial coverage and reconstruction quality.
Large climate-model ensembles are computationally expensive; yet many downstream analyses would benefit from additional, statistically consistent realizations of spatiotemporal climate variables. We study a generative modeling approach for producing new realizations from a limited set of available runs by transferring structure learned across an ensemble. Using monthly near-surface temperature time series from ten independent reanalysis realizations (ERA5), we find that a vanilla conditional variational autoencoder (CVAE) trained jointly across realizations yields a fragmented latent space that fails to generalize to unseen ensemble members. To address this, we introduce a latent-constrained CVAE (LC-CVAE) that enforces cross-realization homogeneity of latent embeddings at a small set of shared geographic 'anchor' locations. We then use multi-output Gaussian process regression in the latent space to predict latent coordinates at unsampled locations in a new realization, followed by decoding to generate full time series fields. Experiments and ablations demonstrate (i) instability when training on a single realization, (ii) diminishing returns after incorporating roughly five realizations, and (iii) a trade-off between spatial coverage and reconstruction quality that is closely linked to the average neighbor distance in latent space.