CVOct 4, 2023

Consistent-1-to-3: Consistent Image to 3D View Synthesis via Geometry-aware Diffusion Models

arXiv:2310.03020v291 citationsh-index: 10
AI Analysis

This work solves the problem of generating consistent 3D views from single images for 3D object understanding, representing an incremental improvement over existing methods.

The paper tackles the problem of zero-shot novel view synthesis from a single image by addressing 3D consistency issues, resulting in a generative framework that enables full 360-degree observation with improved consistency across views.

Zero-shot novel view synthesis (NVS) from a single image is an essential problem in 3D object understanding. While recent approaches that leverage pre-trained generative models can synthesize high-quality novel views from in-the-wild inputs, they still struggle to maintain 3D consistency across different views. In this paper, we present Consistent-1-to-3, which is a generative framework that significantly mitigates this issue. Specifically, we decompose the NVS task into two stages: (i) transforming observed regions to a novel view, and (ii) hallucinating unseen regions. We design a scene representation transformer and view-conditioned diffusion model for performing these two stages respectively. Inside the models, to enforce 3D consistency, we propose to employ epipolor-guided attention to incorporate geometry constraints, and multi-view attention to better aggregate multi-view information. Finally, we design a hierarchy generation paradigm to generate long sequences of consistent views, allowing a full 360-degree observation of the provided object image. Qualitative and quantitative evaluation over multiple datasets demonstrates the effectiveness of the proposed mechanisms against state-of-the-art approaches. Our project page is at https://jianglongye.com/consistent123/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes