CVApr 11, 2017

CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction

arXiv:1704.03489v1778 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of dense 3D reconstruction from single images for robotics and AR/VR applications, representing an incremental improvement by integrating learned depth prediction with existing SLAM techniques.

The paper tackles the problem of dense monocular SLAM by fusing CNN-predicted depth maps with direct SLAM measurements to improve accuracy in low-textured regions and estimate absolute scale, achieving robust and accurate results on benchmark datasets.

Given the recent advances in depth prediction from Convolutional Neural Networks (CNNs), this paper investigates how predicted depth maps from a deep neural network can be deployed for accurate and dense monocular reconstruction. We propose a method where CNN-predicted dense depth maps are naturally fused together with depth measurements obtained from direct monocular SLAM. Our fusion scheme privileges depth prediction in image locations where monocular SLAM approaches tend to fail, e.g. along low-textured regions, and vice-versa. We demonstrate the use of depth prediction for estimating the absolute scale of the reconstruction, hence overcoming one of the major limitations of monocular SLAM. Finally, we propose a framework to efficiently fuse semantic labels, obtained from a single frame, with dense SLAM, yielding semantically coherent scene reconstruction from a single view. Evaluation results on two benchmark datasets show the robustness and accuracy of our approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes