CVDec 3, 2018

TwoStreamVAN: Improving Motion Modeling in Video Generation

arXiv:1812.01037v219 citationsHas Code
AI Analysis

This addresses the problem of poor motion modeling in video generation for AI and computer vision applications, representing an incremental improvement over existing methods.

The paper tackles the challenge of generating realistic videos by disentangling motion and content generation, proposing TwoStreamVAN, which outperforms existing methods on datasets like Weizmann Human Action, MUG Facial Expression, and VoxCeleb.

Video generation is an inherently challenging task, as it requires modeling realistic temporal dynamics as well as spatial content. Existing methods entangle the two intrinsically different tasks of motion and content creation in a single generator network, but this approach struggles to simultaneously generate plausible motion and content. To im-prove motion modeling in video generation tasks, we propose a two-stream model that disentangles motion generation from content generation, called a Two-Stream Variational Adversarial Network (TwoStreamVAN). Given an action label and a noise vector, our model is able to create clear and consistent motion, and thus yields photorealistic videos. The key idea is to progressively generate and fuse multi-scale motion with its corresponding spatial content. Our model significantly outperforms existing methods on the standard Weizmann Human Action, MUG Facial Expression, and VoxCeleb datasets, as well as our new dataset of diverse human actions with challenging and complex motion. Our code is available at https://github.com/sunxm2357/TwoStreamVAN/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes