CVMay 5, 2023

HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer

arXiv:2305.03595v140 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of accurate visual localization for applications in computer vision and robotics, representing an incremental improvement over previous methods.

The paper tackles single-image RGB visual localization by introducing HSCNet++, a hierarchical scene coordinate network that predicts pixel scene coordinates in a coarse-to-fine manner, setting new state-of-the-art results on multiple datasets including 7-Scenes, 12 Scenes, and Cambridge Landmarks.

Visual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image and a pre-built 3D model. Recently, deep neural networks have been exploited to regress the mapping between raw pixels and 3D coordinates in the scene, and thus the matching is implicitly performed by the forward pass through the network. However, in a large and ambiguous environment, learning such a regression task directly can be difficult for a single network. In this work, we present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image. The proposed method, which is an extension of HSCNet, allows us to train compact models which scale robustly to large environments. It sets a new state-of-the-art for single-image localization on the 7-Scenes, 12 Scenes, Cambridge Landmarks datasets, and the combined indoor scenes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes