CVAIHCLGAug 16, 2025

RealTalk: Realistic Emotion-Aware Lifelike Talking-Head Synthesis

arXiv:2508.12163v1h-index: 12025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
Originality Incremental advance
AI Analysis

This work addresses the challenge of creating emotionally expressive and controllable AI-driven talking heads for applications in social intelligence, representing an incremental improvement over prior methods.

The paper tackled the problem of generating realistic emotional expressions in talking-head synthesis, which current methods often fail to control accurately while preserving identity, and introduced RealTalk, a framework that outperforms existing methods in emotion accuracy, controllability, and identity preservation.

Emotion is a critical component of artificial social intelligence. However, while current methods excel in lip synchronization and image quality, they often fail to generate accurate and controllable emotional expressions while preserving the subject's identity. To address this challenge, we introduce RealTalk, a novel framework for synthesizing emotional talking heads with high emotion accuracy, enhanced emotion controllability, and robust identity preservation. RealTalk employs a variational autoencoder (VAE) to generate 3D facial landmarks from driving audio, which are concatenated with emotion-label embeddings using a ResNet-based landmark deformation model (LDM) to produce emotional landmarks. These landmarks and facial blendshape coefficients jointly condition a novel tri-plane attention Neural Radiance Field (NeRF) to synthesize highly realistic emotional talking heads. Extensive experiments demonstrate that RealTalk outperforms existing methods in emotion accuracy, controllability, and identity preservation, advancing the development of socially intelligent AI systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes