Training Discriminative Models to Evaluate Generative Ones
This addresses the challenge of evaluating generative models for researchers in machine learning, though it is incremental as it builds on existing evaluation difficulties.
The paper tackles the problem of objectively evaluating generative models by proposing a method that uses classifier accuracy on real test data when trained on generated data as a proxy for assessing how well the generative model fits the true data distribution. The results show that no generative model can fully replace real data for training discriminative models, but initial GAN and WGAN perform best on MNIST and fashion MNIST datasets.
Generative models are known to be difficult to assess. Recent works, especially on generative adversarial networks (GANs), produce good visual samples of varied categories of images. However, the validation of their quality is still difficult to define and there is no existing agreement on the best evaluation process. This paper aims at making a step toward an objective evaluation process for generative models. It presents a new method to assess a trained generative model by evaluating the test accuracy of a classifier trained with generated data. The test set is composed of real images. Therefore, The classifier accuracy is used as a proxy to evaluate if the generative model fit the true data distribution. By comparing results with different generated datasets we are able to classify and compare generative models. The motivation of this approach is also to evaluate if generative models can help discriminative neural networks to learn, i.e., measure if training on generated data is able to make a model successful at testing on real settings. Our experiments compare different generators from the Variational Auto-Encoders (VAE) and Generative Adversarial Network (GAN) frameworks on MNIST and fashion MNIST datasets. Our results show that none of the generative models is able to replace completely true data to train a discriminative model. But they also show that the initial GAN and WGAN are the best choices to generate on MNIST database (Modified National Institute of Standards and Technology database) and fashion MNIST database.