CVLGDec 13, 2024

unPIC: A Geometric Multiview Prior for Image to 3D Synthesis

arXiv:2412.10273v2h-index: 20
Originality Incremental advance
AI Analysis

This work addresses the challenge of image-to-3D synthesis for applications in computer vision and graphics, representing an incremental improvement with a geometry-driven method.

The paper tackles the problem of generating 3D multiviews from a single 2D image by introducing a hierarchical probabilistic approach that uses a diffusion prior to predict unseen 3D geometry and conditions a diffusion decoder for novel-view synthesis, achieving superior performance over baselines like CAT3D and EscherNet on datasets including ObjaverseXL and real-world objects.

We introduce a hierarchical probabilistic approach to go from a 2D image to multiview 3D: a diffusion "prior" predicts the unseen 3D geometry, which then conditions a diffusion "decoder" to generate novel views of the subject. We use a pointmap-based geometric representation to coordinate the generation of multiple target views simultaneously. We construct a predictable distribution of geometric features per target view to enable learnability across examples, and generalization to arbitrary inputs images. Our modular, geometry-driven approach to novel-view synthesis (called "unPIC") beats competing baselines such as CAT3D, EscherNet, Free3D, and One-2-3-45 on held-out objects from ObjaverseXL, as well as unseen real-world objects from Google Scanned Objects, Amazon Berkeley Objects, and the Digital Twin Catalog.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes