Diffusion Model for Manifold Data: Score Decomposition, Curvature, and Statistical Complexity
This work provides foundational statistical insights for generative modeling on manifolds, advancing theory for diffusion models in structured data applications.
The paper tackled the theoretical understanding of diffusion models for high-dimensional data on low-dimensional structures, revealing that statistical rates for score estimation and distribution learning are governed by the intrinsic dimension and manifold curvature.
Diffusion models have become a leading framework in generative modeling, yet their theoretical understanding -- especially for high-dimensional data concentrated on low-dimensional structures -- remains incomplete. This paper investigates how diffusion models learn such structured data, focusing on two key aspects: statistical complexity and influence of data geometric properties. By modeling data as samples from a smooth Riemannian manifold, our analysis reveals crucial decompositions of score functions in diffusion models under different levels of injected noise. We also highlight the interplay of manifold curvature with the structures in the score function. These analyses enable an efficient neural network approximation to the score function, built upon which we further provide statistical rates for score estimation and distribution learning. Remarkably, the obtained statistical rates are governed by the intrinsic dimension of data and the manifold curvature. These results advance the statistical foundations of diffusion models, bridging theory and practice for generative modeling on manifolds.