CLApr 12, 2025

Efficient and Asymptotically Unbiased Constrained Decoding for Large Language Models

arXiv:2504.09135v16 citationsh-index: 38AISTATS
Originality Highly original
AI Analysis

This addresses a practical bottleneck for real-world LLM applications requiring constraint adherence, representing a novel method rather than incremental improvement.

The paper tackles the problem of inefficient and biased constrained decoding in large language models, introducing DISC with PPV which achieves theoretically guaranteed asymptotic unbiasedness and demonstrates superior efficiency and output quality in experiments.

In real-world applications of large language models, outputs are often required to be confined: selecting items from predefined product or document sets, generating phrases that comply with safety standards, or conforming to specialized formatting styles. To control the generation, constrained decoding has been widely adopted. However, existing prefix-tree-based constrained decoding is inefficient under GPU-based model inference paradigms, and it introduces unintended biases into the output distribution. This paper introduces Dynamic Importance Sampling for Constrained Decoding (DISC) with GPU-based Parallel Prefix-Verification (PPV), a novel algorithm that leverages dynamic importance sampling to achieve theoretically guaranteed asymptotic unbiasedness and overcomes the inefficiency of prefix-tree. Extensive experiments demonstrate the superiority of our method over existing methods in both efficiency and output quality. These results highlight the potential of our methods to improve constrained generation in applications where adherence to specific constraints is essential.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes