Adversarial sampling of unknown and high-dimensional conditional distributions
This work addresses a specific challenge in engineering and data science for sampling conditional distributions with continuous variables, but it is incremental as it builds on existing GAN methods with moment estimation.
The paper tackles the problem of sampling from unknown high-dimensional conditional distributions, particularly when only sparse data is available for continuous conditioning variables, by using generative adversarial networks (GANs) with a priori estimation of conditional moments. It demonstrates that the proposed algorithm effectively samples target distributions with minimal quality loss compared to state-of-the-art methods, as shown in a turbulent flow deconvolution case.
Many engineering problems require the prediction of realization-to-realization variability or a refined description of modeled quantities. In that case, it is necessary to sample elements from unknown high-dimensional spaces with possibly millions of degrees of freedom. While there exist methods able to sample elements from probability density functions (PDF) with known shapes, several approximations need to be made when the distribution is unknown. In this paper the sampling method, as well as the inference of the underlying distribution, are both handled with a data-driven method known as generative adversarial networks (GAN), which trains two competing neural networks to produce a network that can effectively generate samples from the training set distribution. In practice, it is often necessary to draw samples from conditional distributions. When the conditional variables are continuous, only one (if any) data point corresponding to a particular value of a conditioning variable may be available, which is not sufficient to estimate the conditional distribution. This work handles this problem using an a priori estimation of the conditional moments of a PDF. Two approaches, stochastic estimation, and an external neural network are compared here for computing these moments; however, any preferred method can be used. The algorithm is demonstrated in the case of the deconvolution of a filtered turbulent flow field. It is shown that all the versions of the proposed algorithm effectively sample the target conditional distribution with minimal impact on the quality of the samples compared to state-of-the-art methods. Additionally, the procedure can be used as a metric for the diversity of samples generated by a conditional GAN (cGAN) conditioned with continuous variables.