ML LGMay 31, 2018

Assessing Generative Models via Precision and Recall

Mehdi S. M. Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, Sylvain Gelly

arXiv:1806.00035v241.6745 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the need for better evaluation metrics in generative modeling, offering a more nuanced tool for researchers, though it is incremental as it builds on existing divergence methods.

The paper tackles the problem of evaluating generative models by proposing a novel definition of precision and recall for distributions, which disentangles divergence into two dimensions to distinguish between sample quality and coverage. The result is an efficient algorithm that empirically shows the metric can separate these aspects in experiments on GANs and VAEs.

Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison. Commonly used evaluation methods, such as the Frechet Inception Distance (FID), correlate well with the perceived quality of samples and are sensitive to mode dropping. However, these metrics are unable to distinguish between different failure cases since they only yield one-dimensional scores. We propose a novel definition of precision and recall for distributions which disentangles the divergence into two separate dimensions. The proposed notion is intuitive, retains desirable properties, and naturally leads to an efficient algorithm that can be used to evaluate generative models. We relate this notion to total variation as well as to recent evaluation metrics such as Inception Score and FID. To demonstrate the practical utility of the proposed approach we perform an empirical study on several variants of Generative Adversarial Networks and Variational Autoencoders. In an extensive set of experiments we show that the proposed metric is able to disentangle the quality of generated samples from the coverage of the target distribution.

View on arXiv PDF Code

Similar