CVOct 16, 2021

Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor

arXiv:2110.08580v17 citations
Originality Synthesis-oriented
AI Analysis

This work addresses video editing challenges for content creators by providing an easy-to-use tool with incremental improvements in lip-syncing and translation features.

The paper tackled the problem of integrating modern talking face generation algorithms into a video editor to enhance video editing efficiency and quality, resulting in improved efficiency for human editors and better video generation quality as shown by human evaluations.

This paper proposes a video editor based on OpenShot with several state-of-the-art facial video editing algorithms as added functionalities. Our editor provides an easy-to-use interface to apply modern lip-syncing algorithms interactively. Apart from lip-syncing, the editor also uses audio and facial re-enactment to generate expressive talking faces. The manual control improves the overall experience of video editing without missing out on the benefits of modern synthetic video generation algorithms. This control enables us to lip-sync complex dubbed movie scenes, interviews, television shows, and other visual content. Furthermore, our editor provides features that automatically translate lectures from spoken content, lip-sync of the professor, and background content like slides. While doing so, we also tackle the critical aspect of synchronizing background content with the translated speech. We qualitatively evaluate the usefulness of the proposed editor by conducting human evaluations. Our evaluations show a clear improvement in the efficiency of using human editors and an improved video generation quality. We attach demo videos with the supplementary material clearly explaining the tool and also showcasing multiple results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes