CVJan 29, 2025

Learning Semantic Facial Descriptors for Accurate Face Animation

Lei Zhu, Yuanqi Chen, Xiaohang Liu, Thomas H. Li, Ge Li

Peking U

arXiv:2501.17718v16.21 citationsh-index: 28ICASSP

Originality Incremental advance

AI Analysis

This work addresses the problem of high-fidelity face animation for applications in computer vision and graphics, offering an incremental improvement by combining aspects of model-based and model-free approaches.

The paper tackles the challenge of face animation by introducing semantic facial descriptors in a learnable disentangled vector space, decoupling identity and motion subspaces with orthogonal basis vectors, and demonstrates superior performance on benchmarks like VoxCeleb, HDTF, and CelebV, outperforming state-of-the-art methods in identity preservation and motion transfer.

Face animation is a challenging task. Existing model-based methods (utilizing 3DMMs or landmarks) often result in a model-like reconstruction effect, which doesn't effectively preserve identity. Conversely, model-free approaches face challenges in attaining a decoupled and semantically rich feature space, thereby making accurate motion transfer difficult to achieve. We introduce the semantic facial descriptors in learnable disentangled vector space to address the dilemma. The approach involves decoupling the facial space into identity and motion subspaces while endowing each of them with semantics by learning complete orthogonal basis vectors. We obtain basis vector coefficients by employing an encoder on the source and driving faces, leading to effective facial descriptors in the identity and motion subspaces. Ultimately, these descriptors can be recombined as latent codes to animate faces. Our approach successfully addresses the issue of model-based methods' limitations in high-fidelity identity and the challenges faced by model-free methods in accurate motion transfer. Extensive experiments are conducted on three challenging benchmarks (i.e. VoxCeleb, HDTF, CelebV). Comprehensive quantitative and qualitative results demonstrate that our model outperforms SOTA methods with superior identity preservation and motion transfer.

View on arXiv PDF

Similar