LGCLFeb 7, 2024

Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

arXiv:2403.06988v1102 citationsh-index: 64ICML
Originality Incremental advance
AI Analysis

This addresses the challenge of efficiently generating text in specific formats for users of LLMs, representing a strong incremental improvement over existing constrained decoding methods.

The paper tackled the problem of constrained decoding in large language models, which often incurs performance overhead and reduces task accuracy due to misalignment with sub-word vocabularies. They introduced DOMINO, a decoding algorithm that enforces constraints with full sub-word alignment, achieving virtually no overhead and up to almost 2× speedup over unconstrained decoding.

To ensure that text generated by large language models (LLMs) is in an expected format, constrained decoding proposes to enforce strict formal language constraints during generation. However, as we show in this work, not only do such methods incur performance overhead during generation, but many of them also significantly impair task accuracy, if they do not correctly align the underlying LLM sub-word vocabularies with external constraints. To address this, we present a novel decoding algorithm, DOMINO, that can enforce constraints in a fully subword-aligned fashion, while leveraging pre-computation and speculative decoding to achieve virtually no overhead and in some cases even almost 2$\times$ speedup over unconstrained decoding -- thereby outperforming existing approaches by a wide margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes