CLLGMLOct 18, 2022

Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models

arXiv:2210.15458v218 citationsh-index: 16
Originality Highly original
AI Analysis

This addresses the problem of efficient and diverse text generation for users of large language models, offering a novel method that combines parallelism with diversity guarantees.

The paper tackles the trade-off between output diversity and computational parallelism in large language model decoding by introducing a framework for sampling based on an arithmetic code book, which reduces the standard deviation of expected BLEU score reward by more than half and closes the BLEU score gap between independent sampling and beam search by up to 63%.

Decoding methods for large language models often trade-off between diversity of outputs and parallelism of computation. Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize. Alternatively, methods such as temperature sampling and its modifications (top-k sampling, nucleus sampling, typical decoding, and others), are embarrassingly parallel, but have no guarantees about duplicate samples. We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model, compatible with common sampling variations, with provable beam diversity under certain conditions, as well as being embarrassingly parallel and providing unbiased and consistent expectations from the original model. We demonstrate the effectiveness of our approach on WMT machine translation, more than halving the standard deviation when estimating expected BLEU score reward, and closing the BLEU score gap between independent sampling and beam search by up to 63%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes