CVJul 8, 2024

Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN

arXiv:2407.05577v18 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses the challenge of creating realistic talking head videos for applications like entertainment or virtual communication, though it appears incremental as it builds on existing StyleGAN and landmark-based approaches.

The paper tackles the problem of poor visual effects in audio-driven talking head video editing by generating high-resolution, seamless videos with different emotions from speech, achieving high visual quality compared to state-of-the-art methods.

The existing methods for audio-driven talking head video editing have the limitations of poor visual effects. This paper tries to tackle this problem through editing talking face images seamless with different emotions based on two modules: (1) an audio-to-landmark module, consisting of the CrossReconstructed Emotion Disentanglement and an alignment network module. It bridges the gap between speech and facial motions by predicting corresponding emotional landmarks from speech; (2) a landmark-based editing module edits face videos via StyleGAN. It aims to generate the seamless edited video consisting of the emotion and content components from the input audio. Extensive experiments confirm that compared with state-of-the-arts methods, our method provides high-resolution videos with high visual quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes