CV LGSep 1, 2021

On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation

Eric Brachmann, Martin Humenberger, Carsten Rother, Torsten Sattler

arXiv:2109.00524v122.284 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work highlights a critical methodological issue in benchmarking for computer vision researchers, potentially affecting rankings and claims in the field.

The paper analyzes two widely used visual camera re-localization datasets and shows that evaluation outcomes vary with the choice of the reference algorithm used to generate pseudo ground truth, questioning common beliefs such as learning-based methods outperforming classical ones.

Benchmark datasets that measure camera pose accuracy have driven progress in visual re-localisation research. To obtain poses for thousands of images, it is common to use a reference algorithm to generate pseudo ground truth. Popular choices include Structure-from-Motion (SfM) and Simultaneous-Localisation-and-Mapping (SLAM) using additional sensors like depth cameras if available. Re-localisation benchmarks thus measure how well each method replicates the results of the reference algorithm. This begs the question whether the choice of the reference algorithm favours a certain family of re-localisation methods. This paper analyzes two widely used re-localisation datasets and shows that evaluation outcomes indeed vary with the choice of the reference algorithm. We thus question common beliefs in the re-localisation literature, namely that learning-based scene coordinate regression outperforms classical feature-based methods, and that RGB-D-based methods outperform RGB-based methods. We argue that any claims on ranking re-localisation methods should take the type of the reference algorithm, and the similarity of the methods to the reference algorithm, into account.

View on arXiv PDF Code

Similar