CLJun 20, 2023

Open-Domain Text Evaluation via Contrastive Distribution Methods

BerkeleyMeta AIMicrosoftU of Toronto
arXiv:2306.11879v42 citationsh-index: 50
Originality Incremental advance
AI Analysis

This addresses the problem of assessing generation quality for researchers and practitioners in NLP, offering a novel evaluation approach that is incremental in improving automatic metrics.

The paper tackles the challenge of evaluating open-domain text generation by introducing Contrastive Distribution Methods (CDM), which map contrasts between probabilistic distributions to quality measures, resulting in superior correlation with human judgment compared to existing metrics.

Recent advancements in open-domain text generation, driven by the power of large pre-trained language models (LLMs), have demonstrated remarkable performance. However, assessing these models' generation quality remains a challenge. In this paper, we introduce a novel method for evaluating open-domain text generation called Contrastive Distribution Methods (CDM). Leveraging the connection between increasing model parameters and enhanced LLM performance, CDM creates a mapping from the _contrast_ of two probabilistic distributions -- one known to be superior to the other -- to quality measures. We investigate CDM for open-domain text generation evaluation under two paradigms: 1) _Generative_ CDM, which harnesses the contrast of two language models' distributions to generate synthetic examples for training discriminator-based metrics; 2) _Discriminative_ CDM, which directly uses distribution disparities between two language models for evaluation. Our experiments on coherence evaluation for multi-turn dialogue and commonsense evaluation for controllable generation demonstrate CDM's superior correlate with human judgment than existing automatic evaluation metrics, highlighting the strong performance and generalizability of our approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes