Local and Global Decoding in Text Generation
This work addresses a technical issue in text generation for applications like dialogue systems, but it is incremental as it builds on existing decoding methods.
The paper tackled the problem of distortion in text generation decoding algorithms by introducing globally-normalised versions of top-k and top-π methods and proposing a Metropolis-Hastings algorithm to approximate sampling from these distributions. Results showed that global decoding performed worse than local decoding in most configurations, indicating that distortion is a key feature of local methods.
Text generation, a key component in applications such as dialogue systems, relies on decoding algorithms that sample strings from a language model distribution. Traditional methods, such as top-$k$ and top-$π$, apply local normalisation to the model's output distribution, which can distort it. In this paper, we investigate the effect of this distortion by introducing globally-normalised versions of these decoding methods. Additionally, we propose an independent Metropolis-Hastings algorithm to approximate sampling from globally-normalised distributions without explicitly computing them. Our empirical analysis compares the performance of local and global normalisation across two decoding algorithms (top-$k$ and top-$π$) with various hyperparameters, using Pythia language models. Results show that, in most configurations, global decoding performs worse than the local decoding version of the same algorithms -- despite preserving the distribution's integrity. Our results suggest that distortion is an important feature of local decoding algorithms.