CVJul 10, 2024

Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization

arXiv:2407.08023v11 citationsh-index: 14Has Code
AI Analysis

This work addresses egocentric localization for robotics or AR/VR applications, but it is incremental as it builds directly on prior methods like EgoLoc.

The paper tackles the VQ3D task by improving camera pose estimation through a hybrid structure-from-motion and camera relocalization pipeline, achieving a 1.5% increase in overall success rate over the previous state-of-the-art.

We built our pipeline EgoLoc-v1, mainly inspired by EgoLoc. We propose a model ensemble strategy to improve the camera pose estimation part of the VQ3D task, which has been proven to be essential in previous work. The core idea is not only to do SfM for egocentric videos but also to do 2D-3D matching between existing 3D scans and 2D video frames. In this way, we have a hybrid SfM and camera relocalization pipeline, which can provide us with more camera poses, leading to higher QwP and overall success rate. Our method achieves the best performance regarding the most important metric, the overall success rate. We surpass previous state-of-the-art, the competitive EgoLoc, by $1.5\%$. The code is available at \url{https://github.com/Wayne-Mai/egoloc_v1}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes