CVSDASNov 28, 2023

DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser

arXiv:2311.16565v222 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses the problem of creating personalized and efficient 3D facial animations from speech for applications in academia and industry, representing an incremental improvement over existing diffusion-based methods.

The paper tackles the limitations of personalization and slow generation in speech-driven 3D facial animation by proposing DiffusionTalker, which uses contrastive learning for personalization and knowledge distillation to accelerate generation from hundreds to 8 steps, outperforming state-of-the-art methods.

Speech-driven 3D facial animation has been an attractive task in both academia and industry. Traditional methods mostly focus on learning a deterministic mapping from speech to animation. Recent approaches start to consider the non-deterministic fact of speech-driven 3D face animation and employ the diffusion model for the task. However, personalizing facial animation and accelerating animation generation are still two major limitations of existing diffusion-based methods. To address the above limitations, we propose DiffusionTalker, a diffusion-based method that utilizes contrastive learning to personalize 3D facial animation and knowledge distillation to accelerate 3D animation generation. Specifically, to enable personalization, we introduce a learnable talking identity to aggregate knowledge in audio sequences. The proposed identity embeddings extract customized facial cues across different people in a contrastive learning manner. During inference, users can obtain personalized facial animation based on input audio, reflecting a specific talking style. With a trained diffusion model with hundreds of steps, we distill it into a lightweight model with 8 steps for acceleration. Extensive experiments are conducted to demonstrate that our method outperforms state-of-the-art methods. The code will be released.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes