LGMLMar 14, 2019

Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement

arXiv:1903.06059v2314 citations
Originality Incremental advance
AI Analysis

This provides a principled method for efficient and diverse sequence sampling in NLP, with incremental improvements over existing sampling techniques.

The paper tackles the problem of sampling sequences without replacement by introducing the Gumbel-Top-k trick, which enables exact sampling with computational cost linear in k and sequence length. In a translation task, it outperforms alternatives by producing diverse, high-quality translations and reduces variance in BLEU score and entropy estimators.

The well-known Gumbel-Max trick for sampling from a categorical distribution can be extended to sample $k$ elements without replacement. We show how to implicitly apply this 'Gumbel-Top-$k$' trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search. Even for exponentially large domains, the number of model evaluations grows only linear in $k$ and the maximum sampled sequence length. The algorithm creates a theoretical connection between sampling and (deterministic) beam search and can be used as a principled intermediate alternative. In a translation task, the proposed method compares favourably against alternatives to obtain diverse yet good quality translations. We show that sequences sampled without replacement can be used to construct low-variance estimators for expected sentence-level BLEU score and model entropy.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes