CVApr 20

MUA: Mobile Ultra-detailed Animatable Avatars

arXiv:2604.1858381.3h-index: 25

Predicted impact top 26% in CV · last 90 daysOriginality Incremental advance

AI Analysis

For mobile VR/AR applications, this work enables real-time, high-fidelity animatable avatars on resource-constrained devices, bridging the gap between quality and efficiency.

MUA introduces a wavelet-guided, factorized blendshape representation that compresses a high-fidelity animatable avatar by 2000x in computation and 10x in model size, achieving 180 FPS on desktop and 24 FPS on Meta Quest 3 while preserving visual quality comparable to server-grade models.

Building photorealistic, animatable full-body digital humans remains a longstanding challenge in computer graphics and vision. Recent advances in animatable avatar modeling have largely progressed along two directions: improving the fidelity of dynamic geometry and appearance, or reducing computational complexity to enable deployment on resource-constrained platforms, e.g., VR headsets. However, existing approaches fail to achieve both goals simultaneously: Ultra-high-fidelity avatars typically require substantial computation on server-class GPUs, whereas lightweight avatars often suffer from limited surface dynamics, reduced appearance details, and noticeable artifacts. To bridge this gap, we propose a novel animatable avatar representation, termed Wavelet-guided Multi-level Spatial Factorized Blendshapes, and a corresponding distillation pipeline that transfers motion-aware clothing dynamics and fine-grained appearance details from a pre-trained ultra-high-quality avatar model into a compact, efficient representation. By coupling multi-level wavelet spectral decomposition with low-rank structural factorization in texture space, our method achieves up to 2000X lower computational cost and a 10X smaller model size than the original high-quality teacher avatar model, while preserving visually plausible dynamics and appearance details closely resemble those of the teacher model. Extensive comparisons with state-of-the-art methods show that our approach significantly outperforms existing avatar approaches designed for mobile settings and achieves comparable or superior rendering quality to most approaches that can only run on servers. Importantly, our representation substantially improves the practicality of high-fidelity avatars for immersive applications, achieving over 180 FPS on a desktop PC and real-time native on-device performance at 24 FPS on a standalone Meta Quest 3.

View on arXiv PDF

Similar