Convergence of denoising diffusion models under the manifold hypothesis
This work addresses a foundational problem in machine learning theory for researchers and practitioners using diffusion models, by extending analysis beyond the restrictive assumption of Lebesgue-measurable densities.
The paper tackles the theoretical gap in denoising diffusion models by providing convergence results for target distributions supported on lower-dimensional manifolds or empirical distributions, offering quantitative bounds on the Wasserstein distance between target and generative distributions.
Denoising diffusion models are a recent class of generative models exhibiting state-of-the-art performance in image and audio synthesis. Such models approximate the time-reversal of a forward noising process from a target distribution to a reference density, which is usually Gaussian. Despite their strong empirical results, the theoretical analysis of such models remains limited. In particular, all current approaches crucially assume that the target density admits a density w.r.t. the Lebesgue measure. This does not cover settings where the target distribution is supported on a lower-dimensional manifold or is given by some empirical distribution. In this paper, we bridge this gap by providing the first convergence results for diffusion models in this more general setting. In particular, we provide quantitative bounds on the Wasserstein distance of order one between the target data distribution and the generative distribution of the diffusion model.