CLFeb 2, 2021

MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers

arXiv:2102.01454v3521 citations
Originality Highly original
AI Analysis

This work addresses the critical problem of accurately evaluating open-ended text generation models for the NLP research community.

This paper introduces MAUVE, a new metric for comparing machine-generated text to human text distributions using divergence frontiers. It scales to large models by operating in a quantized embedding space and correlates with human judgments across three open-ended generation tasks.

As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We introduce MAUVE, a comparison measure for open-ended text generation, which directly compares the learnt distribution from a text generation model to the distribution of human-written text using divergence frontiers. MAUVE scales up to modern text generation models by computing information divergences in a quantized embedding space. Through an extensive empirical study on three open-ended generation tasks, we find that MAUVE identifies known properties of generated text, scales naturally with model size, and correlates with human judgments, with fewer restrictions than existing distributional evaluation metrics.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes