CVJun 30, 2019

Large-scale, real-time visual-inertial localization revisited

Simon Lynen, Bernhard Zeisl, Dror Aiger, Michael Bosse, Joel Hesch, Marc Pollefeys, Roland Siegwart, Torsten Sattler

arXiv:1907.00338v117.5100 citations

Originality Incremental advance

AI Analysis

This work addresses scalability and latency issues for real-world applications like robot navigation and augmented reality, but it is incremental as it builds on existing local features and sparse 3D models.

The paper tackles the challenge of achieving large-scale, real-time visual-inertial localization by developing a system that compresses scene appearance and geometry for efficient storage and lookup, enabling deployment at unprecedented scale. It demonstrates effectiveness by localizing 2.5 million images across four cities with query latencies around 200ms.

The overarching goals in image-based localization are scale, robustness and speed. In recent years, approaches based on local features and sparse 3D point-cloud models have both dominated the benchmarks and seen successful realworld deployment. They enable applications ranging from robot navigation, autonomous driving, virtual and augmented reality to device geo-localization. Recently end-to-end learned localization approaches have been proposed which show promising results on small scale datasets. However the positioning accuracy, scalability, latency and compute & storage requirements of these approaches remain open challenges. We aim to deploy localization at global-scale where one thus relies on methods using local features and sparse 3D models. Our approach spans from offline model building to real-time client-side pose fusion. The system compresses appearance and geometry of the scene for efficient model storage and lookup leading to scalability beyond what what has been previously demonstrated. It allows for low-latency localization queries and efficient fusion run in real-time on mobile platforms by combining server-side localization with real-time visual-inertial-based camera pose tracking. In order to further improve efficiency we leverage a combination of priors, nearest neighbor search, geometric match culling and a cascaded pose candidate refinement step. This combination outperforms previous approaches when working with large scale models and allows deployment at unprecedented scale. We demonstrate the effectiveness of our approach on a proof-of-concept system localizing 2.5 million images against models from four cities in different regions on the world achieving query latencies in the 200ms range.

View on arXiv PDF

Similar