CLApr 11, 2022

A Call for Clarity in Beam Search: How It Works and When It Stops

AI2UW
arXiv:2204.05424v387 citationsh-index: 114
Originality Incremental advance
AI Analysis

This work addresses a clarity issue in beam search implementations for text generation, offering a simple, incremental improvement that benefits researchers and practitioners using models like Hugging Face Transformers.

The paper identifies that common beam search implementations use a first-come-first-served heuristic for stopping, which is often overlooked, and introduces a patience factor to generalize this criterion, improving decoding performance on news summarization and machine translation with negligible slowdown.

Text generation with beam search has proven successful in a wide range of applications. We point out that, though largely overlooked in the literature, the commonly-used implementation of beam decoding (e.g., Hugging Face Transformers and fairseq) uses a first come, first served heuristic: it keeps a set of already completed sequences over time steps and stops when the size of this set reaches the beam size. Based on this finding, we introduce a patience factor, a simple modification to this beam decoding implementation, that generalizes the stopping criterion and provides flexibility to the depth of search. Empirical results demonstrate that adjusting this patience factor improves decoding performance of strong pretrained models on news text summarization and machine translation over diverse language pairs, with a negligible inference slowdown. Our approach only modifies one line of code and can be thus readily incorporated in any implementation. Further, we find that different versions of beam decoding result in large performance differences in summarization, demonstrating the need for clarity in specifying the beam search implementation in research work. Our code will be available upon publication.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes