On the Quantitative Analysis of Decoder-Based Generative Models
This addresses the challenge of reliable evaluation for generative models, which is crucial for researchers and practitioners in machine learning, though it is incremental as it applies an existing method to a specific domain.
The paper tackled the problem of quantifying performance in decoder-based generative models by proposing Annealed Importance Sampling for log-likelihood evaluation, validated with bidirectional Monte Carlo, and used this technique to analyze model performance, estimator effectiveness, overfitting, and mode coverage.
The past several years have seen remarkable progress in generative models which produce convincing samples of images and other modalities. A shared component of many powerful generative models is a decoder network, a parametric deep neural net that defines a generative distribution. Examples include variational autoencoders, generative adversarial networks, and generative moment matching networks. Unfortunately, it can be difficult to quantify the performance of these models because of the intractability of log-likelihood estimation, and inspecting samples can be misleading. We propose to use Annealed Importance Sampling for evaluating log-likelihoods for decoder-based models and validate its accuracy using bidirectional Monte Carlo. The evaluation code is provided at https://github.com/tonywu95/eval_gen. Using this technique, we analyze the performance of decoder-based models, the effectiveness of existing log-likelihood estimators, the degree of overfitting, and the degree to which these models miss important modes of the data distribution.