Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression
This work addresses data scarcity in visual localization for applications like robotics or AR, offering a modular solution, though it is incremental as it builds on existing KSCR and NeRF methods.
The paper tackles the problem of limited data hindering keypoint scene coordinate regression (KSCR) for visual localization by proposing a pipeline that uses Neural Radiance Field (NeRF) to synthesize keypoint descriptors from novel poses, improving generalization in data-scarce environments and boosting localization accuracy by up to 50%.
Classical structural-based visual localization methods offer high accuracy but face trade-offs in terms of storage, speed, and privacy. A recent innovation, keypoint scene coordinate regression (KSCR) named D2S addresses these issues by leveraging graph attention networks to enhance keypoint relationships and predict their 3D coordinates using a simple multilayer perceptron (MLP). Camera pose is then determined via PnP+RANSAC, using established 2D-3D correspondences. While KSCR achieves competitive results, rivaling state-of-the-art image-retrieval methods like HLoc across multiple benchmarks, its performance is hindered when data samples are limited due to the deep learning model's reliance on extensive data. This paper proposes a solution to this challenge by introducing a pipeline for keypoint descriptor synthesis using Neural Radiance Field (NeRF). By generating novel poses and feeding them into a trained NeRF model to create new views, our approach enhances the KSCR's generalization capabilities in data-scarce environments. The proposed system could significantly improve localization accuracy by up to 50% and cost only a fraction of time for data synthesis. Furthermore, its modular design allows for the integration of multiple NeRFs, offering a versatile and efficient solution for visual localization. The implementation is publicly available at: https://github.com/ais-lab/DescriptorSynthesis4Feat2Map.