MMOct 1, 2017

Video Generation From Text

arXiv:1710.00421v1312 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of generating videos from text for applications in content creation and AI, but it is incremental as it builds on existing generative models.

The paper tackles video generation from text by training a conditional generative model that extracts static and dynamic information, using a hybrid VAE-GAN framework and automatically creating a matched text-video corpus. It shows that the framework generates plausible and diverse videos, significantly outperforming baseline models adapted from text-to-image generation.

Generating videos from text has proven to be a significant challenge for existing generative models. We tackle this problem by training a conditional generative model to extract both static and dynamic information from text. This is manifested in a hybrid framework, employing a Variational Autoencoder (VAE) and a Generative Adversarial Network (GAN). The static features, called "gist," are used to sketch text-conditioned background color and object layout structure. Dynamic features are considered by transforming input text into an image filter. To obtain a large amount of data for training the deep-learning model, we develop a method to automatically create a matched text-video corpus from publicly available online videos. Experimental results show that the proposed framework generates plausible and diverse videos, while accurately reflecting the input text information. It significantly outperforms baseline models that directly adapt text-to-image generation procedures to produce videos. Performance is evaluated both visually and by adapting the inception score used to evaluate image generation in GANs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes