CVMar 11, 2019

Video Generation from Single Semantic Label Map

arXiv:1903.04480v1111 citations
Originality Incremental advance
AI Analysis

This addresses the problem of flexible and high-quality video synthesis for applications like simulation and content creation, though it is incremental by building on existing image and flow generation methods.

The paper tackles video generation from a single semantic label map by decomposing it into generating a high-quality first frame and then animating it with predicted optical flow, achieving state-of-the-art results on the Cityscapes dataset.

This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process. Different from typical end-to-end approaches, which model both scene content and dynamics in a single step, we propose to decompose this difficult task into two sub-problems. As current image generation methods do better than video generation in terms of detail, we synthesize high quality content by only generating the first frame. Then we animate the scene based on its semantic meaning to obtain the temporally coherent video, giving us excellent results overall. We employ a cVAE for predicting optical flow as a beneficial intermediate step to generate a video sequence conditioned on the initial single frame. A semantic label map is integrated into the flow prediction module to achieve major improvements in the image-to-video generation process. Extensive experiments on the Cityscapes dataset show that our method outperforms all competing methods.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes