CVAug 4, 2019

To Learn or Not to Learn: Visual Localization from Essential Matrices

arXiv:1908.01293v2116 citations
AI Analysis

This addresses the trade-off between accuracy and adaptability in visual localization for applications like self-driving cars and Mixed Reality, offering incremental insights.

The paper tackles the problem of visual localization, where scene-specific methods are accurate but not easily adaptable, while deep learning approaches are adaptable but less accurate. The authors propose a framework using classical feature-based methods to achieve state-of-the-art performance and analyze why learned methods underperform.

Visual localization is the problem of estimating a camera within a scene and a key component in computer vision applications such as self-driving cars and Mixed Reality. State-of-the-art approaches for accurate visual localization use scene-specific representations, resulting in the overhead of constructing these models when applying the techniques to new scenes. Recently, deep learning-based approaches based on relative pose estimation have been proposed, carrying the promise of easily adapting to new scenes. However, it has been shown such approaches are currently significantly less accurate than state-of-the-art approaches. In this paper, we are interested in analyzing this behavior. To this end, we propose a novel framework for visual localization from relative poses. Using a classical feature-based approach within this framework, we show state-of-the-art performance. Replacing the classical approach with learned alternatives at various levels, we then identify the reasons for why deep learned approaches do not perform well. Based on our analysis, we make recommendations for future work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes