CVJun 1

HumanNOVA: Photorealistic, Universal and Rapid 3D Human Avatar Modeling from a Single Image

arXiv:2606.025730.35
AI Analysis55

This work addresses the need for fast, high-quality 3D avatar creation from a single image, which is important for applications in VR/AR, gaming, and telepresence.

HumanNOVA generates photorealistic 3D human avatars from a single RGB image in under one second, achieving state-of-the-art results across multiple benchmarks by scaling training data to 100k assets via a novel data pipeline.

In this paper, we present HumanNOVA, a photorealistic, universal, and rapid model for generating 3D human avatars from a single RGB image. Achieving both photorealism and generalization is challenging due to the scarcity of diverse, high-quality 3D human data. To address this, we build a scalable data generation pipeline that follows two strategies. The first one is to leverage existing rigged assets and animate them with extensive poses from daily life. The second strategy is to utilize existing multi-camera captures of humans and employ fitting to generate more diverse views for training. These two strategies enable us to scale up to 100k assets, significantly enhancing both the quantity and the diversity of data for robust model training. In terms of the architecture, HumanNOVA adopts a feed-forward, token-conditioned avatar modeling framework that allows fast inference in less than one second and requires no test-time optimization. Given an input image and an estimated simplified human mesh (SMPL) without detailed geometry or appearance, the model first encodes both inputs into compact token representations. These tokens then act as conditioning signals and are fused through cross-attention to construct a triplane-based 3D avatar representation. Extensive experiments on multiple benchmarks demonstrate the superiority of our approach, both quantitatively and qualitatively, as well as its robustness under diverse input image conditions. Project page at https://HumanNOVA.github.io .

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it โ€” not by global fame.

Your Notes