CVApr 4, 2024

The More You See in 2D, the More You Perceive in 3D

Berkeley
arXiv:2404.03652v18 citationsh-index: 54CVPR
Originality Incremental advance
AI Analysis

This addresses the problem of 3D perception from 2D images for computer vision applications, representing an incremental advance by combining and adapting existing techniques.

The paper tackles 3D reconstruction and novel view synthesis from unposed images by introducing SAP3D, a system that adapts a pre-trained diffusion model and camera poses via test-time fine-tuning, showing performance improves with more input images and bridging gaps between existing methods.

Humans can infer 3D structure from 2D images of an object based on past experience and improve their 3D understanding as they see more images. Inspired by this behavior, we introduce SAP3D, a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images. Given a few unposed images of an object, we adapt a pre-trained view-conditioned diffusion model together with the camera poses of the images via test-time fine-tuning. The adapted diffusion model and the obtained camera poses are then utilized as instance-specific priors for 3D reconstruction and novel view synthesis. We show that as the number of input images increases, the performance of our approach improves, bridging the gap between optimization-based prior-less 3D reconstruction methods and single-image-to-3D diffusion-based methods. We demonstrate our system on real images as well as standard synthetic benchmarks. Our ablation studies confirm that this adaption behavior is key for more accurate 3D understanding.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes