CVMay 30, 2025

AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion

arXiv:2505.24877v16 citationsh-index: 40
Originality Highly original
AI Analysis

This addresses the need for high-fidelity, animatable avatars in real-world applications like gaming or virtual reality, representing a strong specific gain rather than a foundational breakthrough.

The paper tackles the problem of generating detailed, animation-ready 3D human avatars from a single image, achieving state-of-the-art performance in reconstruction and reposing.

Existing methods for image-to-3D avatar generation struggle to produce highly detailed, animation-ready avatars suitable for real-world applications. We introduce AdaHuman, a novel framework that generates high-fidelity animatable 3D avatars from a single in-the-wild image. AdaHuman incorporates two key innovations: (1) A pose-conditioned 3D joint diffusion model that synthesizes consistent multi-view images in arbitrary poses alongside corresponding 3D Gaussian Splats (3DGS) reconstruction at each diffusion step; (2) A compositional 3DGS refinement module that enhances the details of local body parts through image-to-image refinement and seamlessly integrates them using a novel crop-aware camera ray map, producing a cohesive detailed 3D avatar. These components allow AdaHuman to generate highly realistic standardized A-pose avatars with minimal self-occlusion, enabling rigging and animation with any input motion. Extensive evaluation on public benchmarks and in-the-wild images demonstrates that AdaHuman significantly outperforms state-of-the-art methods in both avatar reconstruction and reposing. Code and models will be publicly available for research purposes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes