SPGen: Spherical Projection as Consistent and Flexible Representation for Single Image 3D Shape Generation
This addresses the problem of generating consistent and flexible 3D shapes from single images for computer vision and graphics applications, representing a novel method for a known bottleneck.
The paper tackles the problem of inconsistent multi-view reconstructions and limited representation of complex internal structures in single-image 3D shape generation by introducing Spherical Projection (SP), which projects geometry onto a sphere and unwraps it into a 2D representation. The result is SPGen, which significantly outperforms existing baselines in geometric quality and computational efficiency.
Existing single-view 3D generative models typically adopt multiview diffusion priors to reconstruct object surfaces, yet they remain prone to inter-view inconsistencies and are unable to faithfully represent complex internal structure or nontrivial topologies. In particular, we encode geometry information by projecting it onto a bounding sphere and unwrapping it into a compact and structural multi-layer 2D Spherical Projection (SP) representation. Operating solely in the image domain, SPGen offers three key advantages simultaneously: (1) Consistency. The injective SP mapping encodes surface geometry with a single viewpoint which naturally eliminates view inconsistency and ambiguity; (2) Flexibility. Multi-layer SP maps represent nested internal structures and support direct lifting to watertight or open 3D surfaces; (3) Efficiency. The image-domain formulation allows the direct inheritance of powerful 2D diffusion priors and enables efficient finetuning with limited computational resources. Extensive experiments demonstrate that SPGen significantly outperforms existing baselines in geometric quality and computational efficiency.