MoD-SLAM: Monocular Dense Mapping for Unbounded 3D Scene Reconstruction
This addresses the limitation of existing monocular SLAM systems for unbounded scenes, enabling broader applications in robotics and AR/VR, though it is incremental as it builds on NeRF-based methods.
The authors tackled the problem of monocular SLAM being limited to bounded scenes by proposing MoD-SLAM, a method for real-time 3D reconstruction in unbounded scenes, which improved reconstruction accuracy by up to 30% and localization by 15% compared to state-of-the-art systems.
Monocular SLAM has received a lot of attention due to its simple RGB inputs and the lifting of complex sensor constraints. However, existing monocular SLAM systems are designed for bounded scenes, restricting the applicability of SLAM systems. To address this limitation, we propose MoD-SLAM, the first monocular NeRF-based dense mapping method that allows 3D reconstruction in real-time in unbounded scenes. Specifically, we introduce a Gaussian-based unbounded scene representation approach to solve the challenge of mapping scenes without boundaries. This strategy is essential to extend the SLAM application. Moreover, a depth estimation module in the front-end is designed to extract accurate priori depth values to supervise mapping and tracking processes. By introducing a robust depth loss term into the tracking process, our SLAM system achieves more precise pose estimation in large-scale scenes. Our experiments on two standard datasets show that MoD-SLAM achieves competitive performance, improving the accuracy of the 3D reconstruction and localization by up to 30% and 15% respectively compared with existing state-of-the-art monocular SLAM systems.