MultiGO++: Monocular 3D Clothed Human Reconstruction via Geometry-Texture Collaboration
This work provides a more accurate and realistic method for 3D clothed human reconstruction from single images, which is beneficial for applications in virtual reality, gaming, and fashion for creators and developers.
This paper addresses the problem of reconstructing a complete and realistic textured 3D human avatar from a single image. The authors propose MultiGO++, a framework that leverages geometry-texture collaboration to overcome limitations in existing methods, resulting in superior reconstruction quality compared to state-of-the-art approaches on two benchmarks and in-the-wild cases.
Monocular 3D clothed human reconstruction aims to generate a complete and realistic textured 3D avatar from a single image. Existing methods are commonly trained under multi-view supervision with annotated geometric priors, and during inference, these priors are estimated by the pre-trained network from the monocular input. These methods are constrained by three key limitations: texturally by unavailability of training data, geometrically by inaccurate external priors, and systematically by biased single-modality supervision, all leading to suboptimal reconstruction. To address these issues, we propose a novel reconstruction framework, named MultiGO++, which achieves effective systematic geometry-texture collaboration. It consists of three core parts: (1) A multi-source texture synthesis strategy that constructs 15,000+ 3D textured human scans to improve the performance on texture quality estimation in challenge scenarios; (2) A region-aware shape extraction module that extracts and interacts features of each body region to obtain geometry information and a Fourier geometry encoder that mitigates the modality gap to achieve effective geometry learning; (3) A dual reconstruction U-Net that leverages geometry-texture collaborative features to refine and generate high-fidelity textured 3D human meshes. Extensive experiments on two benchmarks and many in-the-wild cases show the superiority of our method over state-of-the-art approaches.