ML LG PR STDec 1, 2025

Dimension-free error estimate for diffusion model and optimal scheduling

Valentin de Bortoli, Romuald Elie, Anna Kazeykina, Zhenjie Ren, Jiacheng Zhang

arXiv:2512.01820v114.03 citationsh-index: 30

Originality Incremental advance

AI Analysis

This provides theoretical guarantees for diffusion models in high-dimensional settings, addressing a key limitation of previous error metrics that scaled poorly with dimension.

The authors tackled the problem of quantifying errors in diffusion generative models, deriving a dimension-free bound on the discrepancy between generated and true data distributions using a smooth test functional metric, and applied this to derive an optimal time-scheduling strategy that minimizes discretization error.

Diffusion generative models have emerged as powerful tools for producing synthetic data from an empirically observed distribution. A common approach involves simulating the time-reversal of an Ornstein-Uhlenbeck (OU) process initialized at the true data distribution. Since the score function associated with the OU process is typically unknown, it is approximated using a trained neural network. This approximation, along with finite time simulation, time discretization and statistical approximation, introduce several sources of error whose impact on the generated samples must be carefully understood. Previous analyses have quantified the error between the generated and the true data distributions in terms of Wasserstein distance or Kullback-Leibler (KL) divergence. However, both metrics present limitations: KL divergence requires absolute continuity between distributions, while Wasserstein distance, though more general, leads to error bounds that scale poorly with dimension, rendering them impractical in high-dimensional settings. In this work, we derive an explicit, dimension-free bound on the discrepancy between the generated and the true data distributions. The bound is expressed in terms of a smooth test functional with bounded first and second derivatives. The key novelty lies in the use of this weaker, functional metric to obtain dimension-independent guarantees, at the cost of higher regularity on the test functions. As an application, we formulate and solve a variational problem to minimize the time-discretization error, leading to the derivation of an optimal time-scheduling strategy for the reverse-time diffusion. Interestingly, this scheduler has appeared previously in the literature in a different context; our analysis provides a new justification for its optimality, now grounded in minimizing the discretization bias in generative sampling.

View on arXiv PDF

Similar