CLOct 2, 2023

Closing the Curious Case of Neural Text Degeneration

Matthew Finlayson, John Hewitt, Alexander Koller, Swabha Swayamdipta, Ashish Sabharwal

arXiv:2310.01693v19.633 citationsh-index: 56Has Code

Originality Incremental advance

AI Analysis

This work addresses a fundamental problem in natural language generation for researchers and practitioners by providing theoretical insights and incremental improvements to sampling algorithms.

The paper tackled the lack of theoretical understanding of why truncation sampling methods like nucleus sampling are effective in neural text generation, and developed a new sampling strategy based on the softmax bottleneck that outperforms threshold-based methods in low-entropy open-ended generation, as shown by automatic and human evaluations.

Despite their ubiquity in language generation, it remains unknown why truncation sampling heuristics like nucleus sampling are so effective. We provide a theoretical explanation for the effectiveness of the truncation sampling by proving that truncation methods that discard tokens below some probability threshold (the most common type of truncation) can guarantee that all sampled tokens have nonzero true probability. However, thresholds are a coarse heuristic, and necessarily discard some tokens with nonzero true probability as well. In pursuit of a more precise sampling strategy, we show that we can leverage a known source of model errors, the softmax bottleneck, to prove that certain tokens have nonzero true probability, without relying on a threshold. Based on our findings, we develop an experimental truncation strategy and the present pilot studies demonstrating the promise of this type of algorithm. Our evaluations show that our method outperforms its threshold-based counterparts under automatic and human evaluation metrics for low-entropy (i.e., close to greedy) open-ended text generation. Our theoretical findings and pilot experiments provide both insight into why truncation sampling works, and make progress toward more expressive sampling algorithms that better surface the generative capabilities of large language models.

View on arXiv PDF Code

Similar