Diffusion Probabilistic Modeling for Video Generation
This work advances video generation for applications like simulation and media, though it is incremental as it extends existing diffusion methods from images to video.
The paper tackles video generation by adapting denoising diffusion probabilistic models to autoregressively generate future frames, achieving significant improvements in perceptual quality across four datasets and outperforming five baselines in probabilistic forecasting metrics.
Denoising diffusion probabilistic models are a promising new class of generative models that mark a milestone in high-quality image generation. This paper showcases their ability to sequentially generate video, surpassing prior methods in perceptual and probabilistic forecasting metrics. We propose an autoregressive, end-to-end optimized video diffusion model inspired by recent advances in neural video compression. The model successively generates future frames by correcting a deterministic next-frame prediction using a stochastic residual generated by an inverse diffusion process. We compare this approach against five baselines on four datasets involving natural and simulation-based videos. We find significant improvements in terms of perceptual quality for all datasets. Furthermore, by introducing a scalable version of the Continuous Ranked Probability Score (CRPS) applicable to video, we show that our model also outperforms existing approaches in their probabilistic frame forecasting ability.