LGAICLDec 30, 2022

MAUVE Scores for Generative Models: Theory and Practice

UW
arXiv:2212.14578v241 citationsh-index: 111
Originality Incremental advance
AI Analysis

This provides a practical tool for diagnosing and improving generative models in text and image domains, though it is incremental as it builds on existing divergence-based metrics.

The paper tackles the problem of automatically measuring how close generated data distributions are to target distributions in generative AI, and presents MAUVE scores that quantify gaps between human-written text and neural language models by correlating with human judgments and identifying known properties of generated texts and images.

Generative artificial intelligence has made significant strides, producing text indistinguishable from human prose and remarkably photorealistic images. Automatically measuring how close the generated data distribution is to the target distribution is central to diagnosing existing models and developing better ones. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore three approaches to statistically estimate these scores: vector quantization, non-parametric estimation, and classifier-based estimation. We provide statistical bounds for the vector quantization approach. Empirically, we find that the proposed scores paired with a range of $f$-divergences and statistical estimation methods can quantify the gaps between the distributions of human-written text and those of modern neural language models by correlating with human judgments and identifying known properties of the generated texts. We demonstrate in the vision domain that MAUVE can identify known properties of generated images on par with or better than existing metrics. In conclusion, we present practical recommendations for using MAUVE effectively with language and image modalities.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes