CVMar 2

Coarse-to-Fine Monocular Re-Localization in OpenStreetMap via Semantic Alignment

Yuchen Zou, Xiao Hu, Dexing Zhong, Yuqing Tang

arXiv:2603.01613v11.5h-index: 26

Originality Incremental advance

AI Analysis

This work addresses scalable and privacy-preserving localization for intelligent agents, though it appears incremental as it builds on existing semantic alignment and coarse-to-fine paradigms.

The paper tackles monocular re-localization using OpenStreetMap by addressing cross-modal discrepancies and computational costs, achieving improved accuracy and speed, with orientation recall outperforming state-of-the-art methods at stricter thresholds.

Monocular re-localization plays a crucial role in enabling intelligent agents to achieve human-like perception. However, traditional methods rely on dense maps, which face scalability limitations and privacy risks. OpenStreetMap (OSM), as a lightweight map that protects privacy, offers semantic and geometric information with global scalability. Nonetheless, there are still challenges in using OSM for localization: the inherent cross-modal discrepancies between natural images and OSM, as well as the high computational cost of global map-based localization. In this paper, we propose a hierarchical search framework with semantic alignment for localization in OSM. First, the semantic awareness capability of DINO-ViT is utilised to deconstruct visual elements to establish semantic relationships with OSM. Second, a coarse-to-fine search paradigm is designed to replace global dense matching, enabling efficient progressive refinement. Extensive experiments demonstrate that our method significantly improves both localization accuracy and speed. When trained on a single dataset, the 3° orientation recall of our method even outperforms the 5° recall of state-of-the-art methods.

View on arXiv PDF

Similar