$L^3$:Scene-agnostic Visual Localization in the Wild
This work addresses the problem of reducing computational and storage overhead for visual localization in the wild, which is beneficial for applications requiring real-time performance and adaptability to new environments.
The paper introduces $L^3$, a map-free visual localization framework that performs online 3D reconstruction from RGB images, followed by metric scale recovery and pose refinement. This approach achieves performance comparable to state-of-the-art methods on various benchmarks and demonstrates significantly superior robustness in sparse scenes.
Standard visual localization methods typically require offline pre-processing of scenes to obtain 3D structural information for better performance. This inevitably introduces additional computational and time costs, as well as the overhead of storing scene representations. Can we visually localize in a wild scene without any off-line preprocessing step? In this paper, we leverage the online inference capabilities of feed-forward 3D reconstruction networks to propose a novel map-free visual localization framework $L^3$. Specifically, by performing direct online 3D reconstruction on RGB images, followed by two-stage metric scale recovery and pose refinement based on 2D-3D correspondences, $L^3$ achieves high accuracy without the need to pre-build or store any offline scene representations. Extensive experiments demonstrate $L^3$ not only that the performance is comparable to state-of-the-art solutions on various benchmarks, but also that it exhibits significantly superior robustness in sparse scenes (fewer reference images per scene).