CVLGIVSep 7, 2021

Simple Video Generation using Neural ODEs

arXiv:2109.03292v121 citations
Originality Synthesis-oriented
AI Analysis

This work addresses video generation for computer vision applications, but it is incremental as it builds on prior Neural ODE models and focuses on a simple dataset.

The paper tackles conditional video generation by modeling time-continuous dynamics in a latent space using Neural ODEs, showing promising results for future frame prediction on the Moving MNIST dataset with 1 and 2 digits.

Despite having been studied to a great extent, the task of conditional generation of sequences of frames, or videos, remains extremely challenging. It is a common belief that a key step towards solving this task resides in modelling accurately both spatial and temporal information in video signals. A promising direction to do so has been to learn latent variable models that predict the future in latent space and project back to pixels, as suggested in recent literature. Following this line of work and building on top of a family of models introduced in prior work, Neural ODE, we investigate an approach that models time-continuous dynamics over a continuous latent space with a differential equation with respect to time. The intuition behind this approach is that these trajectories in latent space could then be extrapolated to generate video frames beyond the time steps for which the model is trained. We show that our approach yields promising results in the task of future frame prediction on the Moving MNIST dataset with 1 and 2 digits.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes