Pose-Free Neural Radiance Fields via Implicit Pose Regularization
This work addresses a robustness issue in pose-free NeRF for 3D scene reconstruction, which is important for applications in computer vision and graphics, but it is incremental as it builds on existing pose-free NeRF pipelines.
The paper tackles the problem of training neural radiance fields (NeRF) with unposed multi-view images, where existing methods suffer from inaccurate pose estimation due to domain gaps between rendered and real images, and it introduces IR-NeRF with implicit pose regularization to improve robustness, achieving superior novel view synthesis and outperforming state-of-the-art methods across synthetic and real datasets.
Pose-free neural radiance fields (NeRF) aim to train NeRF with unposed multi-view images and it has achieved very impressive success in recent years. Most existing works share the pipeline of training a coarse pose estimator with rendered images at first, followed by a joint optimization of estimated poses and neural radiance field. However, as the pose estimator is trained with only rendered images, the pose estimation is usually biased or inaccurate for real images due to the domain gap between real images and rendered images, leading to poor robustness for the pose estimation of real images and further local minima in joint optimization. We design IR-NeRF, an innovative pose-free NeRF that introduces implicit pose regularization to refine pose estimator with unposed real images and improve the robustness of the pose estimation for real images. With a collection of 2D images of a specific scene, IR-NeRF constructs a scene codebook that stores scene features and captures the scene-specific pose distribution implicitly as priors. Thus, the robustness of pose estimation can be promoted with the scene priors according to the rationale that a 2D real image can be well reconstructed from the scene codebook only when its estimated pose lies within the pose distribution. Extensive experiments show that IR-NeRF achieves superior novel view synthesis and outperforms the state-of-the-art consistently across multiple synthetic and real datasets.