LGMar 2

DUEL: Exact Likelihood for Masked Diffusion via Deterministic Unmasking

Gilad Turok, Chris De Sa, Volodymyr Kuleshov

arXiv:2603.01367v13 citationsh-index: 13

Originality Highly original

AI Analysis

This provides a principled evaluation method for masked diffusion models in text generation, enabling fair comparisons and revealing their true potential.

The paper tackles the problem that masked diffusion models (MDMs) lack proper perplexity evaluation by introducing the DUEL framework, which enables exact likelihood computation under the same position selection used at test time. This reveals MDMs are substantially better than previously thought, with the MDM-autoregressive perplexity gap shrinking by up to 32% on in-domain data and 82% on zero-shot benchmarks.

Masked diffusion models (MDMs) generate text by iteratively selecting positions to unmask and then predicting tokens at those positions. Yet MDMs lack proper perplexity evaluation: the ELBO is a loose bound on likelihood under the training distribution, not the test-time distribution, while generative perplexity requires a biased external model and ignores diversity. To address this, we introduce the \textsc{DUEL} framework, which formalizes \emph{deterministic} position selection, unifying leading MDM sampling strategies. We prove \textbf{\textsc{DUEL} admits \emph{exact} likelihood computation} via a simple algorithm, evaluated under the same position selection used at test time. This \textbf{gives MDMs proper perplexity for the first time} -- the natural analogue of autoregressive perplexity. With proper perplexity in hand, we revisit key questions about MDMs. \textbf{MDMs are substantially better than previously thought}: the MDM-autoregressive perplexity gap shrinks by up to 32\% on in-domain data and 82\% on zero-shot benchmarks. \textsc{DUEL} enables the first principled comparison of fast, parallel samplers across compute budgets -- an analysis impossible with the ELBO and unreliable with generative perplexity -- identifying probability margin \citep{kim2025train} as a strong default. Finally, oracle search over position orderings reveals MDMs can far surpass autoregressive models -- achieving 36.47 vs.\ 52.11 perplexity on AG News -- demonstrating the ceiling of MDM performance has not yet been reached.

View on arXiv PDF

Similar