CV AI LG MM SD ASSep 17, 2024

3DFacePolicy: Audio-Driven 3D Facial Animation Based on Action Control

Xuanmeng Sha, Liyun Zhang, Tomohiro Mashita, Naoya Chiba, Yuki Uranishi

arXiv:2409.10848v26.53 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work improves audio-driven 3D facial animation for applications like virtual avatars and entertainment, though it is incremental as it builds on existing control paradigms.

The paper tackled the problem of generating natural and continuous 3D facial animations from audio, addressing issues with frame-by-frame vertex generation in existing methods. The result was a significant outperformance over state-of-the-art methods on VOCASET and BIWI datasets, producing dynamic, expressive, and smooth animations.

Audio-driven 3D facial animation has achieved significant progress in both research and applications. While recent baselines struggle to generate natural and continuous facial movements due to their frame-by-frame vertex generation approach, we propose 3DFacePolicy, a pioneer work that introduces a novel definition of vertex trajectory changes across consecutive frames through the concept of "action". By predicting action sequences for each vertex that encode frame-to-frame movements, we reformulate vertex generation approach into an action-based control paradigm. Specifically, we leverage a robotic control mechanism, diffusion policy, to predict action sequences conditioned on both audio and vertex states. Extensive experiments on VOCASET and BIWI datasets demonstrate that our approach significantly outperforms state-of-the-art methods and is particularly expert in dynamic, expressive and naturally smooth facial animations.

View on arXiv PDF

Similar