CVSep 22, 2017

Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks

arXiv:1709.07592v3200 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of predicting immediate future visual scenes, such as cloud movement, for applications in video synthesis and simulation.

The paper tackles the problem of generating realistic time-lapse videos from a single initial frame, achieving the ability to produce videos of up to 128x128 resolution for 32 frames with demonstrated superiority over state-of-the-art models.

Taking a photo outside, can we predict the immediate future, e.g., how would the cloud move in the sky? We address this problem by presenting a generative adversarial network (GAN) based two-stage approach to generating realistic time-lapse videos of high resolution. Given the first frame, our model learns to generate long-term future frames. The first stage generates videos of realistic contents for each frame. The second stage refines the generated video from the first stage by enforcing it to be closer to real videos with regard to motion dynamics. To further encourage vivid motion in the final generated video, Gram matrix is employed to model the motion more precisely. We build a large scale time-lapse dataset, and test our approach on this new dataset. Using our model, we are able to generate realistic videos of up to $128\times 128$ resolution for 32 frames. Quantitative and qualitative experiment results have demonstrated the superiority of our model over the state-of-the-art models.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes