Reprojection Errors as Prompts for Efficient Scene Coordinate Regression
This work addresses inefficiencies in visual localization for robotics and AR/VR applications, but it is incremental as it builds on existing SCR methods with a novel sampling mechanism.
The paper tackles the problem of scene coordinate regression (SCR) for visual localization by addressing inefficiencies from training on all image regions, including dynamic objects and texture-less areas, and introduces an error-guided feature selection mechanism with SAM to filter out problematic areas, resulting in improved performance over existing SCR methods on Cambridge Landmarks and Indoor6 datasets.
Scene coordinate regression (SCR) methods have emerged as a promising area of research due to their potential for accurate visual localization. However, many existing SCR approaches train on samples from all image regions, including dynamic objects and texture-less areas. Utilizing these areas for optimization during training can potentially hamper the overall performance and efficiency of the model. In this study, we first perform an in-depth analysis to validate the adverse impacts of these areas. Drawing inspiration from our analysis, we then introduce an error-guided feature selection (EGFS) mechanism, in tandem with the use of the Segment Anything Model (SAM). This mechanism seeds low reprojection areas as prompts and expands them into error-guided masks, and then utilizes these masks to sample points and filter out problematic areas in an iterative manner. The experiments demonstrate that our method outperforms existing SCR approaches that do not rely on 3D information on the Cambridge Landmarks and Indoor6 datasets.