SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields
This addresses the limitation of depth supervision in monocular 3D reconstruction, making it more applicable for scenarios where depth data is unavailable, though it is incremental as it builds on neural radiance fields.
The paper tackles the problem of 3D reconstruction from a single 2D image without depth supervision, proposing SceneRF, a self-supervised method using only posed image sequences, and demonstrates that it outperforms all baselines for novel depth view synthesis and scene reconstruction on indoor and outdoor datasets.
3D reconstruction from a single 2D image was extensively covered in the literature but relies on depth supervision at training time, which limits its applicability. To relax the dependence to depth we propose SceneRF, a self-supervised monocular scene reconstruction method using only posed image sequences for training. Fueled by the recent progress in neural radiance fields (NeRF) we optimize a radiance field though with explicit depth optimization and a novel probabilistic sampling strategy to efficiently handle large scenes. At inference, a single input image suffices to hallucinate novel depth views which are fused together to obtain 3D scene reconstruction. Thorough experiments demonstrate that we outperform all baselines for novel depth views synthesis and scene reconstruction, on indoor BundleFusion and outdoor SemanticKITTI. Code is available at https://astra-vision.github.io/SceneRF .