CVMar 2, 2024

Consistent and Optimal Solution to Camera Motion Estimation

arXiv:2403.01174v22 citationsh-index: 4IEEE Trans Pattern Anal Mach Intell
Originality Incremental advance
AI Analysis

This work addresses a fundamental issue in computer vision for applications like 3D reconstruction, offering a more optimal solution but is incremental as it builds on existing epipolar constraint methods.

The paper tackles the problem of camera motion estimation from 2D point correspondences by formulating a maximum likelihood solution, proposing a two-step algorithm with consistent estimation and refinement. It demonstrates that the estimator outperforms state-of-the-art methods in accuracy and CPU time when point numbers reach hundreds, achieving asymptotic efficiency and linear time complexity.

Given 2D point correspondences between an image pair, inferring the camera motion is a fundamental issue in the computer vision community. The existing works generally set out from the epipolar constraint and estimate the essential matrix, which is not optimal in the maximum likelihood (ML) sense. In this paper, we dive into the original measurement model with respect to the rotation matrix and normalized translation vector and formulate the ML problem. We then propose a two-step algorithm to solve it: In the first step, we estimate the variance of measurement noises and devise a consistent estimator based on bias elimination; In the second step, we execute a one-step Gauss-Newton iteration on manifold to refine the consistent estimate. We prove that the proposed estimate owns the same asymptotic statistical properties as the ML estimate: The first is consistency, i.e., the estimate converges to the ground truth as the point number increases; The second is asymptotic efficiency, i.e., the mean squared error of the estimate converges to the theoretical lower bound -- Cramer-Rao bound. In addition, we show that our algorithm has linear time complexity. These appealing characteristics endow our estimator with a great advantage in the case of dense point correspondences. Experiments on both synthetic data and real images demonstrate that when the point number reaches the order of hundreds, our estimator outperforms the state-of-the-art ones in terms of estimation accuracy and CPU time.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes