CLOct 27, 2022

Truncation Sampling as Language Model Desmoothing

John Hewitt, Christopher D. Manning, Percy Liang

Stanford

arXiv:2210.15191v126.9344 citationsh-index: 132Has Code

Originality Incremental advance

AI Analysis

This work addresses the issue of text quality degradation in long sequences generated by language models, which is crucial for applications like content generation and dialogue systems, though it is incremental as it builds on existing truncation methods.

The authors tackled the problem of poor quality in long text samples from neural language models by framing truncation sampling as desmoothing and introducing η-sampling, which truncates words below an entropy-dependent threshold, resulting in more plausible long English documents according to human evaluation, better handling of repetition, and improved performance on test distributions.

Long samples of text from neural language models can be of poor quality. Truncation sampling algorithms--like top-$p$ or top-$k$ -- address this by setting some words' probabilities to zero at each step. This work provides framing for the aim of truncation, and an improved algorithm for that aim. We propose thinking of a neural language model as a mixture of a true distribution and a smoothing distribution that avoids infinite perplexity. In this light, truncation algorithms aim to perform desmoothing, estimating a subset of the support of the true distribution. Finding a good subset is crucial: we show that top-$p$ unnecessarily truncates high-probability words, for example causing it to truncate all words but Trump for a document that starts with Donald. We introduce $η$-sampling, which truncates words below an entropy-dependent probability threshold. Compared to previous algorithms, $η$-sampling generates more plausible long English documents according to humans, is better at breaking out of repetition, and behaves more reasonably on a battery of test distributions.

View on arXiv PDF Code

Similar