CLApr 2

Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models

arXiv:2604.0256033.71 citationsh-index: 4

AI Analysis

This addresses a bottleneck in accelerating text generation for language models, though it is incremental as it builds on existing parallel decoding methods.

The paper tackles the problem of distributional mismatch in parallel decoding for discrete diffusion language models, which degrades output quality due to token dependencies, and proposes DEMASK, a lightweight dependency predictor that achieves 1.7-2.2x speedup while matching or improving accuracy.

Discrete diffusion language models (dLLMs) accelerate text generation by unmasking multiple tokens in parallel. However, parallel decoding introduces a distributional mismatch: it approximates the joint conditional using a fully factorized product of per-token marginals, which degrades output quality when selected tokens are strongly dependent. We propose DEMASK (DEpendency-guided unMASKing), a lightweight dependency predictor that attaches to the final hidden states of a dLLM. In a single forward pass, it estimates pairwise conditional influences between masked positions. Using these predictions, a greedy selection algorithm identifies positions with bounded cumulative dependency for simultaneous unmasking. Under a sub-additivity assumption, we prove this bounds the total variation distance between our parallel sampling and the model's joint. Empirically, DEMASK achieves 1.7-2.2$\times$ speedup on Dream-7B while matching or improving accuracy compared to confidence-based and KL-based baselines.

View on arXiv PDF

Similar