CVJan 26

SelfieAvatar: Real-time Head Avatar reenactment from a Selfie Video

Wei Liang, Hui Yu, Derui Ding, Rachael E. Jack, Philippe G. Schyns

arXiv:2601.18851v11.5FG

Originality Incremental advance

AI Analysis

This work enables high-fidelity head avatar generation for applications like gaming and human-machine interaction, but it is incremental as it builds on existing 3DMM and GAN techniques.

The paper tackled the problem of creating realistic, animatable head avatars from a single selfie video, addressing limitations in capturing full-head details and fine-grained textures in real time, and achieved superior reconstruction with rich textures compared to existing methods.

Head avatar reenactment focuses on creating animatable personal avatars from monocular videos, serving as a foundational element for applications like social signal understanding, gaming, human-machine interaction, and computer vision. Recent advances in 3D Morphable Model (3DMM)-based facial reconstruction methods have achieved remarkable high-fidelity face estimation. However, on the one hand, they struggle to capture the entire head, including non-facial regions and background details in real time, which is an essential aspect for producing realistic, high-fidelity head avatars. On the other hand, recent approaches leveraging generative adversarial networks (GANs) for head avatar generation from videos can achieve high-quality reenactments but encounter limitations in reproducing fine-grained head details, such as wrinkles and hair textures. In addition, existing methods generally rely on a large amount of training data, and rarely focus on using only a simple selfie video to achieve avatar reenactment. To address these challenges, this study introduces a method for detailed head avatar reenactment using a selfie video. The approach combines 3DMMs with a StyleGAN-based generator. A detailed reconstruction model is proposed, incorporating mixed loss functions for foreground reconstruction and avatar image generation during adversarial training to recover high-frequency details. Qualitative and quantitative evaluations on self-reenactment and cross-reenactment tasks demonstrate that the proposed method achieves superior head avatar reconstruction with rich and intricate textures compared to existing approaches.

View on arXiv PDF

Similar