Investigating Label Bias in Beam Search for Open-ended Text Generation
This addresses a key issue for researchers and practitioners in NLP by improving text generation quality, though it is incremental as it builds on existing methods.
The paper tackled the problem of beam search producing repetitive and generic texts in open-ended text generation by identifying label bias as a major cause, and showed that combining locally and globally normalized training reduces this bias with minimal perplexity sacrifice, leading to more diverse and meaningful texts in experiments.
Beam search is an effective and widely used decoding algorithm in many sequence-to-sequence (seq2seq) text generation tasks. However, in open-ended text generation, beam search is often found to produce repetitive and generic texts, sampling-based decoding algorithms like top-k sampling and nucleus sampling are more preferred. Standard seq2seq models suffer from label bias due to its locally normalized probability formulation. This paper provides a series of empirical evidence that label bias is a major reason for such degenerate behaviors of beam search. By combining locally normalized maximum likelihood estimation and globally normalized sequence-level training, label bias can be reduced with almost no sacrifice in perplexity. To quantitatively measure label bias, we test the model's ability to discriminate the groundtruth text and a set of context-agnostic distractors. We conduct experiments on large-scale response generation datasets. Results show that beam search can produce more diverse and meaningful texts with our approach, in terms of both automatic and human evaluation metrics. Our analysis also suggests several future working directions towards the grand challenge of open-ended text generation.