CVLGSep 1, 2021

On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation

arXiv:2109.00524v184 citations
Originality Synthesis-oriented
AI Analysis

This work highlights a critical methodological issue in benchmarking for computer vision researchers, potentially affecting rankings and claims in the field.

The paper analyzes two widely used visual camera re-localization datasets and shows that evaluation outcomes vary with the choice of the reference algorithm used to generate pseudo ground truth, questioning common beliefs such as learning-based methods outperforming classical ones.

Benchmark datasets that measure camera pose accuracy have driven progress in visual re-localisation research. To obtain poses for thousands of images, it is common to use a reference algorithm to generate pseudo ground truth. Popular choices include Structure-from-Motion (SfM) and Simultaneous-Localisation-and-Mapping (SLAM) using additional sensors like depth cameras if available. Re-localisation benchmarks thus measure how well each method replicates the results of the reference algorithm. This begs the question whether the choice of the reference algorithm favours a certain family of re-localisation methods. This paper analyzes two widely used re-localisation datasets and shows that evaluation outcomes indeed vary with the choice of the reference algorithm. We thus question common beliefs in the re-localisation literature, namely that learning-based scene coordinate regression outperforms classical feature-based methods, and that RGB-D-based methods outperform RGB-based methods. We argue that any claims on ranking re-localisation methods should take the type of the reference algorithm, and the similarity of the methods to the reference algorithm, into account.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes