Enhancing Scene Coordinate Regression with Efficient Keypoint Detection and Sequential Information
This work addresses challenges in visual localization for robotics and AR/VR applications, offering incremental improvements over existing SCR methods.
The paper tackles the problem of handling repetitive textures and meaningless areas in Scene Coordinate Regression (SCR) for visual localization by proposing a unified architecture for scene encoding and keypoint detection, along with sequential information utilization. The result is improved computational efficiency and accuracy, with a single-frame mode increasing recall by 6.4% and speed from 56Hz to 90Hz, and a sequence-based mode boosting recall by 11% while maintaining efficiency.
Scene Coordinate Regression (SCR) is a visual localization technique that utilizes deep neural networks (DNN) to directly regress 2D-3D correspondences for camera pose estimation. However, current SCR methods often face challenges in handling repetitive textures and meaningless areas due to their reliance on implicit triangulation. In this paper, we propose an efficient and accurate SCR system. Compared to existing SCR methods, we propose a unified architecture for both scene encoding and salient keypoint detection, allowing our system to prioritize the encoding of informative regions. This design significantly improves computational efficiency. Additionally, we introduce a mechanism that utilizes sequential information during both mapping and relocalization. The proposed method enhances the implicit triangulation, especially in environments with repetitive textures. Comprehensive experiments conducted across indoor and outdoor datasets demonstrate that the proposed system outperforms state-of-the-art (SOTA) SCR methods. Our single-frame relocalization mode improves the recall rate of our baseline by 6.4% and increases the running speed from 56Hz to 90Hz. Furthermore, our sequence-based mode increases the recall rate by 11% while maintaining the original efficiency.