CVMMJul 8, 2025

MEDTalk: Multimodal Controlled 3D Facial Animation with Dynamic Emotions by Disentangled Embedding

arXiv:2507.06071v47 citationsh-index: 6Has CodeMM
Originality Incremental advance
AI Analysis

This addresses the limitation of static, predefined emotions in audio-driven facial animation for applications in entertainment and virtual production, though it is incremental by building on existing multimodal and disentanglement techniques.

The paper tackles the problem of generating 3D facial animations with synchronized lip movements and dynamic, fine-grained emotional expressions, proposing MEDTalk to disentangle content and emotion embeddings for independent control, achieving realistic results that integrate into industrial pipelines like MetaHuman.

Audio-driven emotional 3D facial animation aims to generate synchronized lip movements and vivid facial expressions. However, most existing approaches focus on static and predefined emotion labels, limiting their diversity and naturalness. To address these challenges, we propose MEDTalk, a novel framework for fine-grained and dynamic emotional talking head generation. Our approach first disentangles content and emotion embedding spaces from motion sequences using a carefully designed cross-reconstruction process, enabling independent control over lip movements and facial expressions. Beyond conventional audio-driven lip synchronization, we integrate audio and speech text, predicting frame-wise intensity variations and dynamically adjusting static emotion features to generate realistic emotional expressions. Furthermore, to enhance control and personalization, we incorporate multimodal inputs-including text descriptions and reference expression images-to guide the generation of user-specified facial expressions. With MetaHuman as the priority, our generated results can be conveniently integrated into the industrial production pipeline. The code is available at: https://github.com/SJTU-Lucy/MEDTalk.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes