Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

Kelvin Kan, Xingjian Li, Benjamin J. Zhang, Tuhin Sahai, Stanley Osher, Markos A. Katsoulakis

arXiv:2605.1723295.81 citations

AI Analysis

This work provides the first convergence theory for discrete diffusion models that scales to large vocabularies (e.g., hundreds of thousands of tokens) by removing the state-space-size dependence that made prior bounds vacuous for modern language tasks.

The paper develops a unified adjoint-equation-based framework for discrete diffusion models that achieves dimension-free convergence guarantees in any integral probability metric (IPM), eliminating dependence on state space size $S$ for both masked and uniform priors. The bounds are the first to be entirely free of $S$ and rely only on a standard rate-matrix regularity assumption.

Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked distribution, while bounds in total variation (TV) depend on the state space size $S$ and become vacuous for modern language tasks, where vocabularies contain hundreds of thousands of tokens. We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric (IPM). To the best of our knowledge, our bounds are the first to be entirely free of $S$ and applicable to both masked and uniform priors. Importantly, our theory relies only on a single standard rate-matrix regularity assumption and is compatible with time-inhomogeneous schedules. Four novel techniques drive our improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes $S$-dependence under uniform transitions, and a score-marginal cancellation technique that removes $S$-dependence under masked transitions. Our framework thus sharply departs from prior analyses and avoids the shortcomings of pathspace-KL and existing TV-based approaches. Beyond convergence bounds, our framework provides a versatile toolkit for further theoretical study of discrete diffusion models.

View on arXiv PDF

Similar