Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space
This work provides the first convergence theory for discrete diffusion models that scales to large vocabularies (e.g., hundreds of thousands of tokens) by removing the state-space-size dependence that made prior bounds vacuous for modern language tasks.
The paper develops a unified adjoint-equation-based framework for discrete diffusion models that achieves dimension-free convergence guarantees in any integral probability metric (IPM), eliminating dependence on state space size $S$ for both masked and uniform priors. The bounds are the first to be entirely free of $S$ and rely only on a standard rate-matrix regularity assumption.
Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked distribution, while bounds in total variation (TV) depend on the state space size $S$ and become vacuous for modern language tasks, where vocabularies contain hundreds of thousands of tokens. We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric (IPM). To the best of our knowledge, our bounds are the first to be entirely free of $S$ and applicable to both masked and uniform priors. Importantly, our theory relies only on a single standard rate-matrix regularity assumption and is compatible with time-inhomogeneous schedules. Four novel techniques drive our improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes $S$-dependence under uniform transitions, and a score-marginal cancellation technique that removes $S$-dependence under masked transitions. Our framework thus sharply departs from prior analyses and avoids the shortcomings of pathspace-KL and existing TV-based approaches. Beyond convergence bounds, our framework provides a versatile toolkit for further theoretical study of discrete diffusion models.