LGSTMLTHMay 17

Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

arXiv:2605.1723295.81 citations
AI Analysis

This work provides the first convergence theory for discrete diffusion models that scales to large vocabularies (e.g., hundreds of thousands of tokens) by removing the state-space-size dependence that made prior bounds vacuous for modern language tasks.

The paper develops a unified adjoint-equation-based framework for discrete diffusion models that achieves dimension-free convergence guarantees in any integral probability metric (IPM), eliminating dependence on state space size $S$ for both masked and uniform priors. The bounds are the first to be entirely free of $S$ and rely only on a standard rate-matrix regularity assumption.

Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked distribution, while bounds in total variation (TV) depend on the state space size $S$ and become vacuous for modern language tasks, where vocabularies contain hundreds of thousands of tokens. We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric (IPM). To the best of our knowledge, our bounds are the first to be entirely free of $S$ and applicable to both masked and uniform priors. Importantly, our theory relies only on a single standard rate-matrix regularity assumption and is compatible with time-inhomogeneous schedules. Four novel techniques drive our improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes $S$-dependence under uniform transitions, and a score-marginal cancellation technique that removes $S$-dependence under masked transitions. Our framework thus sharply departs from prior analyses and avoids the shortcomings of pathspace-KL and existing TV-based approaches. Beyond convergence bounds, our framework provides a versatile toolkit for further theoretical study of discrete diffusion models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes