ROCVDec 9, 2024

Enhancing Scene Coordinate Regression with Efficient Keypoint Detection and Sequential Information

arXiv:2412.06488v23 citationsh-index: 12Has CodeIEEE Robot Autom Lett
Originality Incremental advance
AI Analysis

This work addresses challenges in visual localization for robotics and AR/VR applications, offering incremental improvements over existing SCR methods.

The paper tackles the problem of handling repetitive textures and meaningless areas in Scene Coordinate Regression (SCR) for visual localization by proposing a unified architecture for scene encoding and keypoint detection, along with sequential information utilization. The result is improved computational efficiency and accuracy, with a single-frame mode increasing recall by 6.4% and speed from 56Hz to 90Hz, and a sequence-based mode boosting recall by 11% while maintaining efficiency.

Scene Coordinate Regression (SCR) is a visual localization technique that utilizes deep neural networks (DNN) to directly regress 2D-3D correspondences for camera pose estimation. However, current SCR methods often face challenges in handling repetitive textures and meaningless areas due to their reliance on implicit triangulation. In this paper, we propose an efficient and accurate SCR system. Compared to existing SCR methods, we propose a unified architecture for both scene encoding and salient keypoint detection, allowing our system to prioritize the encoding of informative regions. This design significantly improves computational efficiency. Additionally, we introduce a mechanism that utilizes sequential information during both mapping and relocalization. The proposed method enhances the implicit triangulation, especially in environments with repetitive textures. Comprehensive experiments conducted across indoor and outdoor datasets demonstrate that the proposed system outperforms state-of-the-art (SOTA) SCR methods. Our single-frame relocalization mode improves the recall rate of our baseline by 6.4% and increases the running speed from 56Hz to 90Hz. Furthermore, our sequence-based mode increases the recall rate by 11% while maintaining the original efficiency.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes