CV AIAug 12, 2025

X-UniMotion: Animating Human Images with Expressive, Unified and Identity-Agnostic Motion Latents

Guoxian Song, Hongyi Xu, Xiaochen Zhao, You Xie, Tianpei Gu, Zenan Li, Chenxu Zhang, Linjie Luo

arXiv:2508.09383v110 citationsh-index: 8SIGGRAPH Asia

Originality Highly original

AI Analysis

This addresses the challenge of high-fidelity cross-identity human animation for applications in entertainment, virtual reality, and human-computer interaction, representing a novel method rather than an incremental improvement.

The paper tackles the problem of animating human images with expressive whole-body motion transfer across different identities by introducing X-UniMotion, a unified latent representation that encodes facial expressions, body poses, and hand gestures into identity-agnostic tokens, resulting in state-of-the-art performance with superior motion fidelity and identity preservation.

We present X-UniMotion, a unified and expressive implicit latent representation for whole-body human motion, encompassing facial expressions, body poses, and hand gestures. Unlike prior motion transfer methods that rely on explicit skeletal poses and heuristic cross-identity adjustments, our approach encodes multi-granular motion directly from a single image into a compact set of four disentangled latent tokens -- one for facial expression, one for body pose, and one for each hand. These motion latents are both highly expressive and identity-agnostic, enabling high-fidelity, detailed cross-identity motion transfer across subjects with diverse identities, poses, and spatial configurations. To achieve this, we introduce a self-supervised, end-to-end framework that jointly learns the motion encoder and latent representation alongside a DiT-based video generative model, trained on large-scale, diverse human motion datasets. Motion-identity disentanglement is enforced via 2D spatial and color augmentations, as well as synthetic 3D renderings of cross-identity subject pairs under shared poses. Furthermore, we guide motion token learning with auxiliary decoders that promote fine-grained, semantically aligned, and depth-aware motion embeddings. Extensive experiments show that X-UniMotion outperforms state-of-the-art methods, producing highly expressive animations with superior motion fidelity and identity preservation.

View on arXiv PDF

Similar