CVApr 9, 2024

Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences

arXiv:2404.06337v136 citationsh-index: 40CVPR
Originality Highly original
AI Analysis

This enables instant augmented reality applications by providing scale-metric pose estimates without external depth estimators, though it is incremental as it builds on existing keypoint matching methods.

The paper tackles the problem of estimating metric relative camera pose from two images without requiring depth measurements, by introducing MicKey, a keypoint matching pipeline that predicts metric correspondences in 3D camera space. It achieves state-of-the-art performance on the Map-Free Relocalisation benchmark.

Given two images, we can estimate the relative camera pose between them by establishing image-to-image correspondences. Usually, correspondences are 2D-to-2D and the pose we estimate is defined only up to scale. Some applications, aiming at instant augmented reality anywhere, require scale-metric pose estimates, and hence, they rely on external depth estimators to recover the scale. We present MicKey, a keypoint matching pipeline that is able to predict metric correspondences in 3D camera space. By learning to match 3D coordinates across images, we are able to infer the metric relative pose without depth measurements. Depth measurements are also not required for training, nor are scene reconstructions or image overlap information. MicKey is supervised only by pairs of images and their relative poses. MicKey achieves state-of-the-art performance on the Map-Free Relocalisation benchmark while requiring less supervision than competing approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes