Evaluating the design space of diffusion-based generative models
It offers a theoretical foundation for optimizing diffusion model design, addressing a gap in prior work that assumed score function accuracy, which is incremental but important for researchers and practitioners in generative AI.
This paper provides a non-asymptotic convergence analysis of denoising score matching under gradient descent and a refined sampling error analysis for variance exploding models, yielding a full error analysis that guides the design of training and sampling processes for diffusion-based generative models.
Most existing theoretical investigations of the accuracy of diffusion models, albeit significant, assume the score function has been approximated to a certain accuracy, and then use this a priori bound to control the error of generation. This article instead provides a first quantitative understanding of the whole generation process, i.e., both training and sampling. More precisely, it conducts a non-asymptotic convergence analysis of denoising score matching under gradient descent. In addition, a refined sampling error analysis for variance exploding models is also provided. The combination of these two results yields a full error analysis, which elucidates (again, but this time theoretically) how to design the training and sampling processes for effective generation. For instance, our theory implies a preference toward noise distribution and loss weighting in training that qualitatively agree with the ones used in [Karras et al., 2022]. It also provides perspectives on the choices of time and variance schedules in sampling: when the score is well trained, the design in [Song et al., 2021] is more preferable, but when it is less trained, the design in [Karras et al., 2022] becomes more preferable.