LGFeb 23

Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models

arXiv:2602.19619v1h-index: 30Has Code
Originality Incremental advance
AI Analysis

This work addresses a critical evaluation challenge for researchers developing discrete diffusion models, revealing that current benchmarks may mislead about sampler correctness.

The paper tackles the problem of evaluating discrete diffusion language models by showing that existing metrics conflate denoiser and sampler errors, and it demonstrates that few-step samplers are not distributionally correct even with an oracle denoiser, with errors vanishing only as steps approach sequence length.

Discrete diffusion language models (dLLMs) provide a fast and flexible alternative to autoregressive models (ARMs) via iterative denoising with parallel updates. However, their evaluation is challenging: existing metrics conflate denoiser approximation error with sampler-induced error from the sampling dynamics, a problem that does not arise for ARMs whose autoregressive sampling exactly reflects the learned probability model. We introduce a sampler-centric oracle framework that replaces learned denoisers with an exact Hidden Markov Model posterior derived from a ground-truth Markov chain, isolating sampler-induced error in a controlled setting. We show that few-step discrete diffusion samplers are not distributionally correct even under an oracle denoiser, with transition-level mismatch that vanishes only as the number of steps approaches the sequence length. Moreover, improvements in negative log-likelihood, generative perplexity, or MAUVE do not imply correct sampling. Code is available at https://luhantang.github.io/dllm_sampler

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes