Beyond First-Order: Learning Riemannian Geometries for Invariant Visual Place Recognition
For visual place recognition, this work provides a geometric framework that reduces reliance on supervised training while improving robustness to extreme shifts.
Visual Place Recognition suffers from extreme environmental and viewpoint shifts. The proposed Riemannian Invariant Aggregation (RIA) framework models second-order scene structure on the SPD manifold, achieving zero-shot performance comparable to supervised methods and state-of-the-art accuracy with fine-tuning.
Visual Place Recognition (VPR) demands representations robust to drastic environmental and viewpoint shifts. Existing aggregation paradigms either depend on extensive supervised training or rely on first-order pooling, often struggling to preserve structural correlations under extreme shifts or incurring high adaptation costs. In this work, we propose Riemannian Invariant Aggregation (RIA), a unified geometric framework that explicitly models second-order scene structure on the Symmetric Positive Definite (SPD) manifold. By treating perturbations as tractable congruence transformations, RIA leverages geometry-aware Riemannian mappings to project covariance descriptors into a linearized Euclidean space, effectively preserving invariant structural components while suppressing noise. Extensive evaluations demonstrate that RIA achieves zero-shot performance comparable to supervised methods, and establishes state-of-the-art accuracy with simple fine-tuning, particularly in unstructured environments. The source code will be released.