Generating time-consistent dynamics with discriminator-guided image diffusion models
This work addresses the problem of generating realistic spatiotemporal dynamics for applications like weather prediction and climate simulations, offering a more efficient alternative to training video diffusion models from scratch, though it is incremental as it builds on existing image diffusion models.
The authors tackled the challenge of generating realistic temporal dynamics without training video diffusion models from scratch by introducing a time-consistency discriminator that guides pretrained image diffusion models. Their approach performed equally well in temporal consistency, improved uncertainty calibration, reduced biases compared to a VDM, and enabled stable centennial-scale climate simulations at daily time steps.
Realistic temporal dynamics are crucial for many video generation, processing and modelling applications, e.g. in computational fluid dynamics, weather prediction, or long-term climate simulations. Video diffusion models (VDMs) are the current state-of-the-art method for generating highly realistic dynamics. However, training VDMs from scratch can be challenging and requires large computational resources, limiting their wider application. Here, we propose a time-consistency discriminator that enables pretrained image diffusion models to generate realistic spatiotemporal dynamics. The discriminator guides the sampling inference process and does not require extensions or finetuning of the image diffusion model. We compare our approach against a VDM trained from scratch on an idealized turbulence simulation and a real-world global precipitation dataset. Our approach performs equally well in terms of temporal consistency, shows improved uncertainty calibration and lower biases compared to the VDM, and achieves stable centennial-scale climate simulations at daily time steps.