Joint Multi-Condition Representation Modelling via Matrix Factorisation for Visual Place Recognition
This addresses the challenge of improving localization performance under varying conditions for robotics or autonomous systems, though it is incremental as it builds on existing descriptor-level fusion approaches.
The paper tackles the problem of multi-reference visual place recognition by proposing a training-free, descriptor-agnostic method that uses matrix decomposition to jointly model places, improving Recall@1 by up to ~18% over single-reference and outperforming baselines with gains of ~5% on unstructured data.
We address multi-reference visual place recognition (VPR), where reference sets captured under varying conditions are used to improve localisation performance. While deep learning with large-scale training improves robustness, increasing data diversity and model complexity incur extensive computational cost during training and deployment. Descriptor-level fusion via voting or aggregation avoids training, but often targets multi-sensor setups or relies on heuristics with limited gains under appearance and viewpoint change. We propose a training-free, descriptor-agnostic approach that jointly models places using multiple reference descriptors via matrix decomposition into basis representations, enabling projection-based residual matching. We also introduce SotonMV, a structured benchmark for multi-viewpoint VPR. On multi-appearance data, our method improves Recall@1 by up to ~18% over single-reference and outperforms multi-reference baselines across appearance and viewpoint changes, with gains of ~5% on unstructured data, demonstrating strong generalisation while remaining lightweight.