LGMLNov 24, 2025

Masked Diffusion Models are Secretly Learned-Order Autoregressive Models

arXiv:2511.19152v12 citations
Originality Incremental advance
AI Analysis

This work addresses a fundamental limitation in diffusion models for discrete data, potentially improving performance in generative tasks, though it appears incremental as it builds on existing MDM frameworks.

The paper tackled the problem of optimizing decoding order in Masked Diffusion Models (MDMs) for generative modeling over discrete domains, showing that using multivariate noise schedules allows MDMs to identify and optimize for favorable decoding orders during training, establishing them as autoregressive models with learnable orders.

Masked Diffusion Models (MDMs) have emerged as one of the most promising paradigms for generative modeling over discrete domains. It is known that MDMs effectively train to decode tokens in a random order, and that this ordering has significant performance implications in practice. This observation raises a fundamental question: can we design a training framework that optimizes for a favorable decoding order? We answer this in the affirmative, showing that the continuous-time variational objective of MDMs, when equipped with multivariate noise schedules, can identify and optimize for a decoding order during training. We establish a direct correspondence between decoding order and the multivariate noise schedule and show that this setting breaks invariance of the MDM objective to the noise schedule. Furthermore, we prove that the MDM objective decomposes precisely into a weighted auto-regressive losses over these orders, which establishes them as auto-regressive models with learnable orders.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes