LGFeb 23

Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models

Luhan Tang, Longxuan Yu, Shaorong Zhang, Greg Ver Steeg

arXiv:2602.19619v11.4h-index: 3Has Code

Originality Incremental advance

AI Analysis

This work addresses a critical evaluation challenge for researchers developing discrete diffusion models, revealing that current benchmarks may mislead about sampler correctness.

The paper tackles the problem of evaluating discrete diffusion language models by showing that existing metrics conflate denoiser and sampler errors, and it demonstrates that few-step samplers are not distributionally correct even with an oracle denoiser, with errors vanishing only as steps approach sequence length.

Discrete diffusion language models (dLLMs) provide a fast and flexible alternative to autoregressive models (ARMs) via iterative denoising with parallel updates. However, their evaluation is challenging: existing metrics conflate denoiser approximation error with sampler-induced error from the sampling dynamics, a problem that does not arise for ARMs whose autoregressive sampling exactly reflects the learned probability model. We introduce a sampler-centric oracle framework that replaces learned denoisers with an exact Hidden Markov Model posterior derived from a ground-truth Markov chain, isolating sampler-induced error in a controlled setting. We show that few-step discrete diffusion samplers are not distributionally correct even under an oracle denoiser, with transition-level mismatch that vanishes only as the number of steps approaches the sequence length. Moreover, improvements in negative log-likelihood, generative perplexity, or MAUVE do not imply correct sampling. Code is available at https://luhantang.github.io/dllm_sampler

View on arXiv PDF

Similar