One-Shot Pose-Driving Face Animation Platform
This work addresses the need for one-shot, expressive face animation without fine-tuning for specific identities, though it appears incremental as it builds on existing methods.
The paper tackled the problem of generating expressive talking head videos from a single reference face by refining an existing Image2Video model with a Face Locator and Motion Frame mechanism, resulting in enhanced quality and expressiveness through optimization on extensive datasets.
The objective of face animation is to generate dynamic and expressive talking head videos from a single reference face, utilizing driving conditions derived from either video or audio inputs. Current approaches often require fine-tuning for specific identities and frequently fail to produce expressive videos due to the limited effectiveness of Wav2Pose modules. To facilitate the generation of one-shot and more consecutive talking head videos, we refine an existing Image2Video model by integrating a Face Locator and Motion Frame mechanism. We subsequently optimize the model using extensive human face video datasets, significantly enhancing its ability to produce high-quality and expressive talking head videos. Additionally, we develop a demo platform using the Gradio framework, which streamlines the process, enabling users to quickly create customized talking head videos.