LGNAMLNov 27, 2023

Closing the ODE-SDE gap in score-based diffusion models through the Fokker-Planck equation

arXiv:2311.15996v110 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in generative modeling for researchers, but it is incremental as it builds on existing diffusion model frameworks.

The paper tackles the performance gap between ODE and SDE dynamics in score-based diffusion models by deriving a theoretical bound on their distribution differences using the Fokker-Planck equation and showing that regularizing with the Fokker-Planck residual can reduce this gap, though it may degrade SDE sample quality.

Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling, due to their state-of-the art performance in many generation tasks while relying on mathematical foundations such as stochastic differential equations (SDEs) and ordinary differential equations (ODEs). Empirically, it has been reported that ODE based samples are inferior to SDE based samples. In this paper we rigorously describe the range of dynamics and approximations that arise when training score-based diffusion models, including the true SDE dynamics, the neural approximations, the various approximate particle dynamics that result, as well as their associated Fokker--Planck equations and the neural network approximations of these Fokker--Planck equations. We systematically analyse the difference between the ODE and SDE dynamics of score-based diffusion models, and link it to an associated Fokker--Planck equation. We derive a theoretical upper bound on the Wasserstein 2-distance between the ODE- and SDE-induced distributions in terms of a Fokker--Planck residual. We also show numerically that conventional score-based diffusion models can exhibit significant differences between ODE- and SDE-induced distributions which we demonstrate using explicit comparisons. Moreover, we show numerically that reducing the Fokker--Planck residual by adding it as an additional regularisation term leads to closing the gap between ODE- and SDE-induced distributions. Our experiments suggest that this regularisation can improve the distribution generated by the ODE, however that this can come at the cost of degraded SDE sample quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes