Semi-Dense 3D Semantic Mapping from Monocular SLAM
This work addresses the problem of flexible 3D semantic mapping for robots in varied environments, though it appears incremental as it builds on existing SLAM and deep learning methods.
The paper tackled the challenge of acquiring semantic information in 3D mapping by combining deep learning with semi-dense monocular SLAM, resulting in improved 2D semantic labeling over baseline predictions on indoor/outdoor datasets.
The bundle of geometry and appearance in computer vision has proven to be a promising solution for robots across a wide variety of applications. Stereo cameras and RGB-D sensors are widely used to realise fast 3D reconstruction and trajectory tracking in a dense way. However, they lack flexibility of seamless switch between different scaled environments, i.e., indoor and outdoor scenes. In addition, semantic information are still hard to acquire in a 3D mapping. We address this challenge by combining the state-of-art deep learning method and semi-dense Simultaneous Localisation and Mapping (SLAM) based on video stream from a monocular camera. In our approach, 2D semantic information are transferred to 3D mapping via correspondence between connective Keyframes with spatial consistency. There is no need to obtain a semantic segmentation for each frame in a sequence, so that it could achieve a reasonable computation time. We evaluate our method on indoor/outdoor datasets and lead to an improvement in the 2D semantic labelling over baseline single frame predictions.