CVJun 29, 2023

ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models

arXiv:2306.17140v29 citationsh-index: 21
Originality Incremental advance
AI Analysis

This addresses a long-standing and intractable problem in 3D vision for applications like robotics and augmented reality, though it is an incremental improvement leveraging existing diffusion models.

The paper tackles the problem of estimating camera poses from sparse views of a 3D object by inverting a pre-trained diffusion model (Zero-1-to-3) to predict relative poses between images, and it significantly outperforms state-of-the-art methods in experiments with casually captured photos and rendered images.

Given sparse views of a 3D object, estimating their camera poses is a long-standing and intractable problem. Toward this goal, we consider harnessing the pre-trained diffusion model of novel views conditioned on viewpoints (Zero-1-to-3). We present ID-Pose which inverses the denoising diffusion process to estimate the relative pose given two input images. ID-Pose adds a noise to one image, and predicts the noise conditioned on the other image and a hypothesis of the relative pose. The prediction error is used as the minimization objective to find the optimal pose with the gradient descent method. We extend ID-Pose to handle more than two images and estimate each pose with multiple image pairs from triangular relations. ID-Pose requires no training and generalizes to open-world images. We conduct extensive experiments using casually captured photos and rendered images with random viewpoints. The results demonstrate that ID-Pose significantly outperforms state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes