APB2Face: Audio-guided face reenactment with auxiliary pose and blink signals
This work addresses the challenge of realistic audio-driven face animation for applications like virtual avatars or video editing, though it appears incremental by building on prior methods with added signals.
The paper tackles the problem of generating photorealistic faces from audio for face reenactment, where existing methods produce low-quality or low-resolution images, and proposes APB2Face, a deep neural network that uses audio with auxiliary pose and blink signals to achieve superior authenticity and controllability compared to state-of-the-art methods.
Audio-guided face reenactment aims at generating photorealistic faces using audio information while maintaining the same facial movement as when speaking to a real person. However, existing methods can not generate vivid face images or only reenact low-resolution faces, which limits the application value. To solve those problems, we propose a novel deep neural network named APB2Face, which consists of GeometryPredictor and FaceReenactor modules. GeometryPredictor uses extra head pose and blink state signals as well as audio to predict the latent landmark geometry information, while FaceReenactor inputs the face landmark image to reenact the photorealistic face. A new dataset AnnVI collected from YouTube is presented to support the approach, and experimental results indicate the superiority of our method than state-of-the-arts, whether in authenticity or controllability.