CVMMAug 21, 2019

Preserving Semantic and Temporal Consistency for Unpaired Video-to-Video Translation

arXiv:1908.07683v146 citations
AI Analysis

This addresses video domain adaptation for applications like video editing or simulation, but it is incremental as it builds on existing image translation methods by extending them to handle temporal dimensions.

The paper tackles the problem of unpaired video-to-video translation, where semantic inconsistency and temporal flickering degrade visual quality, by proposing a framework with specialized generators, discriminators, and loss functions to preserve content and temporal consistency, resulting in superior performance in evaluations.

In this paper, we investigate the problem of unpaired video-to-video translation. Given a video in the source domain, we aim to learn the conditional distribution of the corresponding video in the target domain, without seeing any pairs of corresponding videos. While significant progress has been made in the unpaired translation of images, directly applying these methods to an input video leads to low visual quality due to the additional time dimension. In particular, previous methods suffer from semantic inconsistency (i.e., semantic label flipping) and temporal flickering artifacts. To alleviate these issues, we propose a new framework that is composed of carefully-designed generators and discriminators, coupled with two core objective functions: 1) content preserving loss and 2) temporal consistency loss. Extensive qualitative and quantitative evaluations demonstrate the superior performance of the proposed method against previous approaches. We further apply our framework to a domain adaptation task and achieve favorable results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes