CVJan 18, 2022

Autoencoding Video Latents for Adversarial Video Generation

arXiv:2201.06888v1
Originality Incremental advance
AI Analysis

This work addresses the problem of video generation for AI and multimedia applications by offering a method to improve control and robustness, though it is incremental as it builds on existing disentanglement approaches.

The paper tackles the challenge of training robust and diverse GAN-based video generative models by proposing AVLAE, an adversarial video latent autoencoder that learns disentangled motion and appearance representations without explicit structural priors, achieving effective results as demonstrated through qualitative and quantitative experiments.

Given the three dimensional complexity of a video signal, training a robust and diverse GAN based video generative model is onerous due to large stochasticity involved in data space. Learning disentangled representations of the data help to improve robustness and provide control in the sampling process. For video generation, there is a recent progress in this area by considering motion and appearance as orthogonal information and designing architectures that efficiently disentangle them. These approaches rely on handcrafting architectures that impose structural priors on the generator to decompose appearance and motion codes in the latent space. Inspired from the recent advancements in the autoencoder based image generation, we present AVLAE (Adversarial Video Latent AutoEncoder) which is a two stream latent autoencoder where the video distribution is learned by adversarial training. In particular, we propose to autoencode the motion and appearance latent vectors of the video generator in the adversarial setting. We demonstrate that our approach learns to disentangle motion and appearance codes even without the explicit structural composition in the generator. Several experiments with qualitative and quantitative results demonstrate the effectiveness of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes