CVJul 3, 2020

Task-agnostic Temporally Consistent Facial Video Editing

arXiv:2007.01466v17 citations
AI Analysis

This addresses the issue of inconsistent and task-specific editing in facial videos for applications in media and entertainment, though it appears incremental as it builds on existing 3D reconstruction models.

The paper tackles the problem of visual flickers and lack of extensibility in facial video editing by proposing a task-agnostic framework that generates more photo-realistic and temporally smooth video portraits compared to state-of-the-art methods.

Recent research has witnessed the advances in facial image editing tasks. For video editing, however, previous methods either simply apply transformations frame by frame or utilize multiple frames in a concatenated or iterative fashion, which leads to noticeable visual flickers. In addition, these methods are confined to dealing with one specific task at a time without any extensibility. In this paper, we propose a task-agnostic temporally consistent facial video editing framework. Based on a 3D reconstruction model, our framework is designed to handle several editing tasks in a more unified and disentangled manner. The core design includes a dynamic training sample selection mechanism and a novel 3D temporal loss constraint that fully exploits both image and video datasets and enforces temporal consistency. Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes