CVOct 18, 2024

Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization

Bin Lin, Yanzhen Yu, Jianhao Ye, Ruitao Lv, Yuguang Yang, Ruoye Xie, Pan Yu, Hongbin Zhou

arXiv:2410.14283v18.74 citationsh-index: 8

Originality Incremental advance

AI Analysis

This addresses the problem of generating realistic, controllable facial animations from audio for applications like virtual avatars or video production, representing a strong incremental improvement with specific gains.

The paper tackles challenges in audio-driven facial animation, including expression leakage and poor synchronization, by introducing Takin-ADA, a two-stage method that improves subtle expression transfer and lip-sync accuracy, achieving 42 FPS at 512x512 resolution and outperforming commercial solutions.

Existing audio-driven facial animation methods face critical challenges, including expression leakage, ineffective subtle expression transfer, and imprecise audio-driven synchronization. We discovered that these issues stem from limitations in motion representation and the lack of fine-grained control over facial expressions. To address these problems, we present Takin-ADA, a novel two-stage approach for real-time audio-driven portrait animation. In the first stage, we introduce a specialized loss function that enhances subtle expression transfer while reducing unwanted expression leakage. The second stage utilizes an advanced audio processing technique to improve lip-sync accuracy. Our method not only generates precise lip movements but also allows flexible control over facial expressions and head motions. Takin-ADA achieves high-resolution (512x512) facial animations at up to 42 FPS on an RTX 4090 GPU, outperforming existing commercial solutions. Extensive experiments demonstrate that our model significantly surpasses previous methods in video quality, facial dynamics realism, and natural head movements, setting a new benchmark in the field of audio-driven facial animation.

View on arXiv PDF

Similar