Waste Not, Want Not; Recycled Gumbel Noise Improves Consistency in Natural Language Generation
This addresses reliability issues in language models for practical applications, though it appears incremental as it builds on existing sampling methods.
The paper tackles the problem of inconsistent outputs from language models by proposing a novel decoding algorithm that improves response consistency across different prompts without degrading quality, achieving up to 10% better performance on consistency benchmarks.
Consistency in the output of language models is critical for their reliability and practical utility. Due to their training objective, language models learn to model the full space of possible continuations, leading to outputs that can vary significantly in style and content, even for similar or repeated inputs. To address this, we propose a novel decoding algorithm that enhances response consistency across different prompts with no degradation in response quality. By incorporating a latent variable into the next-token sampling process based on the Gumbel reparametrisation trick, our method outperforms standard sampling by up to 10% across semantic and stylistic consistency benchmarks. Additionally, our approach integrates seamlessly with existing sampling methods with negligible computational overhead, providing a practical solution for improving the reliability of language model outputs.