SDCVASMar 26, 2025

Dual Audio-Centric Modality Coupling for Talking Head Generation

arXiv:2503.22728v12 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work provides a solution for creating realistic virtual avatars and digital media, but it is incremental as it builds on existing NeRF-based frameworks with novel encoders and fusion techniques.

The paper tackles the problem of generating audio-driven talking head videos by addressing lip synchronization and visual quality issues, resulting in a method that outperforms state-of-the-art approaches in key metrics like lip synchronization accuracy and image quality.

The generation of audio-driven talking head videos is a key challenge in computer vision and graphics, with applications in virtual avatars and digital media. Traditional approaches often struggle with capturing the complex interaction between audio and facial dynamics, leading to lip synchronization and visual quality issues. In this paper, we propose a novel NeRF-based framework, Dual Audio-Centric Modality Coupling (DAMC), which effectively integrates content and dynamic features from audio inputs. By leveraging a dual encoder structure, DAMC captures semantic content through the Content-Aware Encoder and ensures precise visual synchronization through the Dynamic-Sync Encoder. These features are fused using a Cross-Synchronized Fusion Module (CSFM), enhancing content representation and lip synchronization. Extensive experiments show that our method outperforms existing state-of-the-art approaches in key metrics such as lip synchronization accuracy and image quality, demonstrating robust generalization across various audio inputs, including synthetic speech from text-to-speech (TTS) systems. Our results provide a promising solution for high-quality, audio-driven talking head generation and present a scalable approach for creating realistic talking heads.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes