DINO-SLAM: DINO-informed RGB-D SLAM for Neural Implicit and Explicit Representations
This work addresses the need for more comprehensive scene representations in SLAM for robotics and AR/VR applications, though it appears incremental as it builds on existing DINO and SLAM methods.
The paper tackled the problem of enhancing neural implicit and explicit representations in SLAM systems by integrating DINO features, achieving superior performance on benchmarks like Replica, ScanNet, and TUM compared to state-of-the-art methods.
This paper presents DINO-SLAM, a DINO-informed design strategy to enhance neural implicit (Neural Radiance Field -- NeRF) and explicit representations (3D Gaussian Splatting -- 3DGS) in SLAM systems through more comprehensive scene representations. Purposely, we rely on a Scene Structure Encoder (SSE) that enriches DINO features into Enhanced DINO ones (EDINO) to capture hierarchical scene elements and their structural relationships. Building upon it, we propose two foundational paradigms for NeRF and 3DGS SLAM systems integrating EDINO features. Our DINO-informed pipelines achieve superior performance on the Replica, ScanNet, and TUM compared to state-of-the-art methods.