CVROSep 9, 2021

Learning Cross-Scale Visual Representations for Real-Time Image Geo-Localization

arXiv:2109.04087v210 citations
AI Analysis

This addresses localization for robots in GPS-denied environments, offering an incremental improvement with specific efficiency gains.

The paper tackles robot localization in GPS-denied environments by developing a framework that learns cross-scale visual representations from image observations to match against 2D multi-modal geospatial maps, achieving better performance on smaller-scale maps and higher computational efficiency for real-time use.

Robot localization remains a challenging task in GPS denied environments. State estimation approaches based on local sensors, e.g. cameras or IMUs, are drifting-prone for long-range missions as error accumulates. In this study, we aim to address this problem by localizing image observations in a 2D multi-modal geospatial map. We introduce the cross-scale dataset and a methodology to produce additional data from cross-modality sources. We propose a framework that learns cross-scale visual representations without supervision. Experiments are conducted on data from two different domains, underwater and aerial. In contrast to existing studies in cross-view image geo-localization, our approach a) performs better on smaller-scale multi-modal maps; b) is more computationally efficient for real-time applications; c) can serve directly in concert with state estimation pipelines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes