Evaluation of Visual Place Recognition Methods for Image Pair Retrieval in 3D Vision and Robotics
This work addresses the need for robust image pair retrieval in registration pipelines for applications like SLAM and mapping, but it is incremental as it focuses on comparative evaluation of existing methods.
The paper tackled the problem of using Visual Place Recognition (VPR) for image pair retrieval in 3D vision and robotics, evaluating state-of-the-art methods on datasets like Tanks and Temples, ScanNet-GS, and KITTI, and found that modern global descriptor approaches are effective as off-the-shelf modules with domain-dependent performance.
Visual Place Recognition (VPR) is a core component in computer vision, typically formulated as an image retrieval task for localization, mapping, and navigation. In this work, we instead study VPR as an image pair retrieval front-end for registration pipelines, where the goal is to find top-matching image pairs between two disjoint image sets for downstream tasks such as scene registration, SLAM, and Structure-from-Motion. We comparatively evaluate state-of-the-art VPR families - NetVLAD-style baselines, classification-based global descriptors (CosPlace, EigenPlaces), feature-mixing (MixVPR), and foundation-model-driven methods (AnyLoc, SALAD, MegaLoc) - on three challenging datasets: object-centric outdoor scenes (Tanks and Temples), indoor RGB-D scans (ScanNet-GS), and autonomous-driving sequences (KITTI). We show that modern global descriptor approaches are increasingly suitable as off-the-shelf image pair retrieval modules in challenging scenarios including perceptual aliasing and incomplete sequences, while exhibiting clear, domain-dependent strengths and weaknesses that are critical when choosing VPR components for robust mapping and registration.