CVApr 11, 2025

HAL-NeRF: High Accuracy Localization Leveraging Neural Radiance Fields

Asterios Reppas, Grigorios-Aris Cheimariotis, Panos K. Papadopoulos, Panagiotis Frasiolas, Dimitrios Zarpalas

arXiv:2504.08901v13.61 citationsh-index: 102025 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR)

Originality Incremental advance

AI Analysis

This work addresses the need for precise camera localization in XR and robotics, offering a method that significantly improves accuracy over existing approaches, though it is incremental as it builds on prior techniques like APR and NeRF.

The paper tackles the problem of achieving high-accuracy camera localization using only camera captures, presenting HAL-NeRF, which combines a CNN pose regressor with a NeRF-based refinement module. It achieves state-of-the-art results with translation errors of 0.025m and 0.04m and rotation errors of 0.59 and 0.58 degrees on the 7-Scenes and Cambridge Landmarks datasets, respectively.

Precise camera localization is a critical task in XR applications and robotics. Using only the camera captures as input to a system is an inexpensive option that enables localization in large indoor and outdoor environments, but it presents challenges in achieving high accuracy. Specifically, camera relocalization methods, such as Absolute Pose Regression (APR), can localize cameras with a median translation error of more than $0.5m$ in outdoor scenes. This paper presents HAL-NeRF, a high-accuracy localization method that combines a CNN pose regressor with a refinement module based on a Monte Carlo particle filter. The Nerfacto model, an implementation of Neural Radiance Fields (NeRFs), is used to augment the data for training the pose regressor and to measure photometric loss in the particle filter refinement module. HAL-NeRF leverages Nerfacto's ability to synthesize high-quality novel views, significantly improving the performance of the localization pipeline. HAL-NeRF achieves state-of-the-art results that are conventionally measured as the average of the median per scene errors. The translation error was $0.025m$ and the rotation error was $0.59$ degrees and 0.04m and 0.58 degrees on the 7-Scenes dataset and Cambridge Landmarks datasets respectively, with the trade-off of increased computational time. This work highlights the potential of combining APR with NeRF-based refinement techniques to advance monocular camera relocalization accuracy.

View on arXiv PDF

Similar