CrowdGaussian: Reconstructing High-Fidelity 3D Gaussians for Human Crowd from a Single Image
This addresses the previously neglected multi-person scenario in single-view 3D human reconstruction, which is more prevalent but incremental over existing individual-focused methods.
The paper tackles the problem of reconstructing 3D human crowd models from single images, which is challenging due to occlusions, low clarity, and varied appearances, and proposes CrowdGaussian to directly reconstruct multi-person 3D Gaussian Splatting representations, achieving photorealistic and geometrically coherent results.
Single-view 3D human reconstruction has garnered significant attention in recent years. Despite numerous advancements, prior research has concentrated on reconstructing 3D models from clear, close-up images of individual subjects, often yielding subpar results in the more prevalent multi-person scenarios. Reconstructing 3D human crowd models is a highly intricate task, laden with challenges such as: 1) extensive occlusions, 2) low clarity, and 3) numerous and various appearances. To address this task, we propose CrowdGaussian, a unified framework that directly reconstructs multi-person 3D Gaussian Splatting (3DGS) representations from single-image inputs. To handle occlusions, we devise a self-supervised adaptation pipeline that enables the pretrained large human model to reconstruct complete 3D humans with plausible geometry and appearance from heavily occluded inputs. Furthermore, we introduce Self-Calibrated Learning (SCL). This training strategy enables single-step diffusion models to adaptively refine coarse renderings to optimal quality by blending identity-preserving samples with clean/corrupted image pairs. The outputs can be distilled back to enhance the quality of multi-person 3DGS representations. Extensive experiments demonstrate that CrowdGaussian generates photorealistic, geometrically coherent reconstructions of multi-person scenes.