CVDec 11, 2023

STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video Prediction

arXiv:2312.06486v121 citationsh-index: 9Has CodeAAAI
Originality Incremental advance
AI Analysis

This work addresses video prediction for applications requiring high frame rates, but it is incremental as it builds on existing diffusion and stochastic methods.

The paper tackles the challenge of predicting future video frames by learning the uncertainty of underlying factors, proposing a model that decomposes motion and content, uses a neural stochastic differential equation for temporal motion prediction, and an image diffusion model for frame generation, achieving state-of-the-art performance and enabling temporal continuous prediction with arbitrarily high frame rates.

Predicting future frames of a video is challenging because it is difficult to learn the uncertainty of the underlying factors influencing their contents. In this paper, we propose a novel video prediction model, which has infinite-dimensional latent variables over the spatio-temporal domain. Specifically, we first decompose the video motion and content information, then take a neural stochastic differential equation to predict the temporal motion information, and finally, an image diffusion model autoregressively generates the video frame by conditioning on the predicted motion feature and the previous frame. The better expressiveness and stronger stochasticity learning capability of our model lead to state-of-the-art video prediction performances. As well, our model is able to achieve temporal continuous prediction, i.e., predicting in an unsupervised way the future video frames with an arbitrarily high frame rate. Our code is available at \url{https://github.com/XiYe20/STDiffProject}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes