Evaluating Lossy Compression Rates of Deep Generative Models
This addresses the problem of incomplete evaluation for researchers in deep generative modeling, though it is incremental as it builds on existing metrics.
The paper tackled the challenge of quantitatively evaluating deep generative models by proposing rate distortion (RD) curves as a metric, showing that this approach provides a more complete picture than scalar-valued metrics like log-likelihood and yields insights not obtainable from log-likelihoods alone.
The field of deep generative modeling has succeeded in producing astonishingly realistic-seeming images and audio, but quantitative evaluation remains a challenge. Log-likelihood is an appealing metric due to its grounding in statistics and information theory, but it can be challenging to estimate for implicit generative models, and scalar-valued metrics give an incomplete picture of a model's quality. In this work, we propose to use rate distortion (RD) curves to evaluate and compare deep generative models. While estimating RD curves is seemingly even more computationally demanding than log-likelihood estimation, we show that we can approximate the entire RD curve using nearly the same computations as were previously used to achieve a single log-likelihood estimate. We evaluate lossy compression rates of VAEs, GANs, and adversarial autoencoders (AAEs) on the MNIST and CIFAR10 datasets. Measuring the entire RD curve gives a more complete picture than scalar-valued metrics, and we arrive at a number of insights not obtainable from log-likelihoods alone.