Capture, Canonicalize, Splat: Zero-Shot 3D Gaussian Avatars from Unstructured Phone Images
This addresses the challenge of identity-preserving 3D avatar creation from casual photos, offering a zero-shot solution that improves over single-view and synthetic-data methods, though it is incremental as it builds on existing Gaussian splatting techniques.
The paper tackles the problem of creating hyperrealistic 3D avatars from unstructured phone images by introducing a zero-shot pipeline that processes multiple views into a consistent representation and uses a transformer model trained on high-fidelity data, resulting in static quarter-body avatars with compelling realism and robust identity preservation.
We present a novel, zero-shot pipeline for creating hyperrealistic, identity-preserving 3D avatars from a few unstructured phone images. Existing methods face several challenges: single-view approaches suffer from geometric inconsistencies and hallucinations, degrading identity preservation, while models trained on synthetic data fail to capture high-frequency details like skin wrinkles and fine hair, limiting realism. Our method introduces two key contributions: (1) a generative canonicalization module that processes multiple unstructured views into a standardized, consistent representation, and (2) a transformer-based model trained on a new, large-scale dataset of high-fidelity Gaussian splatting avatars derived from dome captures of real people. This "Capture, Canonicalize, Splat" pipeline produces static quarter-body avatars with compelling realism and robust identity preservation from unstructured photos.