CVNov 27, 2023

SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

arXiv:2311.15855v294 citationsh-index: 42
Originality Highly original
AI Analysis

This addresses the challenge of inferring full 3D human details from limited views for applications in graphics and VR, representing a novel integration of generative models into reconstruction workflows.

The paper tackles the problem of creating detailed 3D human reconstructions from single-view images by proposing SiTH, a pipeline that integrates an image-conditioned diffusion model to hallucinate unseen back-view appearances and uses skinned body meshes for reconstruction, achieving superior accuracy and perceptual quality on benchmarks.

A long-standing goal of 3D human reconstruction is to create lifelike and fully detailed 3D humans from single-view images. The main challenge lies in inferring unknown body shapes, appearances, and clothing details in areas not visible in the images. To address this, we propose SiTH, a novel pipeline that uniquely integrates an image-conditioned diffusion model into a 3D mesh reconstruction workflow. At the core of our method lies the decomposition of the challenging single-view reconstruction problem into generative hallucination and reconstruction subproblems. For the former, we employ a powerful generative diffusion model to hallucinate unseen back-view appearance based on the input images. For the latter, we leverage skinned body meshes as guidance to recover full-body texture meshes from the input and back-view images. SiTH requires as few as 500 3D human scans for training while maintaining its generality and robustness to diverse images. Extensive evaluations on two 3D human benchmarks, including our newly created one, highlighted our method's superior accuracy and perceptual quality in 3D textured human reconstruction. Our code and evaluation benchmark are available at https://ait.ethz.ch/sith

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes