CVJul 28, 2025

JOLT3D: Joint Learning of Talking Heads and 3DMM Parameters with Application to Lip-Sync

arXiv:2507.20452v1h-index: 1
Originality Incremental advance
AI Analysis

This work addresses lip-sync quality in talking head generation for applications like animation or virtual avatars, representing an incremental improvement over prior methods.

The paper tackles talking head synthesis by jointly learning 3D face reconstruction and synthesis models to optimize a blendshape representation for facial expressions, resulting in improved face quality and a lip-sync pipeline that reduces flickering near the mouth.

In this work, we revisit the effectiveness of 3DMM for talking head synthesis by jointly learning a 3D face reconstruction model and a talking head synthesis model. This enables us to obtain a FACS-based blendshape representation of facial expressions that is optimized for talking head synthesis. This contrasts with previous methods that either fit 3DMM parameters to 2D landmarks or rely on pretrained face reconstruction models. Not only does our approach increase the quality of the generated face, but it also allows us to take advantage of the blendshape representation to modify just the mouth region for the purpose of audio-based lip-sync. To this end, we propose a novel lip-sync pipeline that, unlike previous methods, decouples the original chin contour from the lip-synced chin contour, and reduces flickering near the mouth.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes