CVGRLGJun 16, 2023

Unsupervised Learning of Style-Aware Facial Animation from Real Acting Performances

arXiv:2306.10006v313 citationsh-index: 35
Originality Incremental advance
AI Analysis

This addresses the challenge of creating expressive and realistic facial animations for applications like virtual avatars or entertainment, though it is incremental as it builds on existing blend-shape and neural rendering techniques.

The paper tackles the problem of generating realistic facial animations from text or speech by learning to disentangle and synthesize different acting styles without supervision, achieving large improvements in perceived quality compared to state-of-the-art methods.

This paper presents a novel approach for text/speech-driven animation of a photo-realistic head model based on blend-shape geometry, dynamic textures, and neural rendering. Training a VAE for geometry and texture yields a parametric model for accurate capturing and realistic synthesis of facial expressions from a latent feature vector. Our animation method is based on a conditional CNN that transforms text or speech into a sequence of animation parameters. In contrast to previous approaches, our animation model learns disentangling/synthesizing different acting-styles in an unsupervised manner, requiring only phonetic labels that describe the content of training sequences. For realistic real-time rendering, we train a U-Net that refines rasterization-based renderings by computing improved pixel colors and a foreground matte. We compare our framework qualitatively/quantitatively against recent methods for head modeling as well as facial animation and evaluate the perceived rendering/animation quality in a user-study, which indicates large improvements compared to state-of-the-art approaches

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes