CV AIMar 2

LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation

Hualiang Wei, Shunran Jia, Jialun Liu, Wenhui Li

arXiv:2603.02129v1h-index: 1

Originality Highly original

AI Analysis

This addresses the problem of creating high-quality, expression-controllable 3D avatars from everyday monocular videos for applications in animation and virtual reality, representing a novel method for a known bottleneck.

The paper tackles the problem of limited expressiveness and reconstruction artifacts in 3D Gaussian Splatting-based avatars caused by sparse kinematic cues in monocular videos, by introducing LiftAvatar, a method that completes sparse observations in kinematic space to drive high-fidelity avatar animation, resulting in substantial gains in animation quality and quantitative metrics, especially under extreme expressions.

We present LiftAvatar, a new paradigm that completes sparse monocular observations in kinematic space (e.g., facial expressions and head pose) and uses the completed signals to drive high-fidelity avatar animation. LiftAvatar is a fine-grained, expression-controllable large-scale video diffusion Transformer that synthesizes high-quality, temporally coherent expression sequences conditioned on single or multiple reference images. The key idea is to lift incomplete input data into a richer kinematic representation, thereby strengthening both reconstruction and animation in downstream 3D avatar pipelines. To this end, we introduce (i) a multi-granularity expression control scheme that combines shading maps with expression coefficients for precise and stable driving, and (ii) a multi-reference conditioning mechanism that aggregates complementary cues from multiple frames, enabling strong 3D consistency and controllability. As a plug-and-play enhancer, LiftAvatar directly addresses the limited expressiveness and reconstruction artifacts of 3D Gaussian Splatting-based avatars caused by sparse kinematic cues in everyday monocular videos. By expanding incomplete observations into diverse pose-expression variations, LiftAvatar also enables effective prior distillation from large-scale video generative models into 3D pipelines, leading to substantial gains. Extensive experiments show that LiftAvatar consistently boosts animation quality and quantitative metrics of state-of-the-art 3D avatar methods, especially under extreme, unseen expressions.

View on arXiv PDF

Similar