Noise in the reverse process improves the approximation capabilities of diffusion models
This work provides theoretical insights into the empirical success of stochasticity in generative modeling, potentially improving model design and sampling capabilities for researchers and practitioners in machine learning.
The paper tackles the problem of why stochastic reverse processes outperform deterministic ones in score-based generative models, showing that neural stochastic differential equations (SDEs) enable L^2 norm trajectory approximation surpassing Wasserstein metric approximation by neural ODEs, even without Lipschitz conditions, and relaxes the Lipschitz requirement on data distribution gradients.
In Score based Generative Modeling (SGMs), the state-of-the-art in generative modeling, stochastic reverse processes are known to perform better than their deterministic counterparts. This paper delves into the heart of this phenomenon, comparing neural ordinary differential equations (ODEs) and neural stochastic differential equations (SDEs) as reverse processes. We use a control theoretic perspective by posing the approximation of the reverse process as a trajectory tracking problem. We analyze the ability of neural SDEs to approximate trajectories of the Fokker-Planck equation, revealing the advantages of stochasticity. First, neural SDEs exhibit a powerful regularizing effect, enabling $L^2$ norm trajectory approximation surpassing the Wasserstein metric approximation achieved by neural ODEs under similar conditions, even when the reference vector field or score function is not Lipschitz. Applying this result, we establish the class of distributions that can be sampled using score matching in SGMs, relaxing the Lipschitz requirement on the gradient of the data distribution in existing literature. Second, we show that this approximation property is preserved when network width is limited to the input dimension of the network. In this limited width case, the weights act as control inputs, framing our analysis as a controllability problem for neural SDEs in probability density space. This sheds light on how noise helps to steer the system towards the desired solution and illuminates the empirical success of stochasticity in generative modeling.