Maximum Likelihood Training for Score-Based Diffusion ODEs by High-Order Denoising Score Matching
This addresses a theoretical gap for researchers in generative modeling, offering an incremental improvement by enhancing likelihood without sacrificing performance.
The paper tackles the problem that first-order score matching is insufficient for maximizing the likelihood in score-based diffusion ODEs, and shows that high-order denoising score matching enables maximum likelihood training, leading to better likelihood on synthetic data and CIFAR-10 while maintaining generation quality.
Score-based generative models have excellent performance in terms of generation quality and likelihood. They model the data distribution by matching a parameterized score network with first-order data score functions. The score network can be used to define an ODE ("score-based diffusion ODE") for exact likelihood evaluation. However, the relationship between the likelihood of the ODE and the score matching objective is unclear. In this work, we prove that matching the first-order score is not sufficient to maximize the likelihood of the ODE, by showing a gap between the maximum likelihood and score matching objectives. To fill up this gap, we show that the negative likelihood of the ODE can be bounded by controlling the first, second, and third-order score matching errors; and we further present a novel high-order denoising score matching method to enable maximum likelihood training of score-based diffusion ODEs. Our algorithm guarantees that the higher-order matching error is bounded by the training error and the lower-order errors. We empirically observe that by high-order score matching, score-based diffusion ODEs achieve better likelihood on both synthetic data and CIFAR-10, while retaining the high generation quality.