CVSep 16, 2016

SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks

John McCormac, Ankur Handa, Andrew Davison, Stefan Leutenegger

arXiv:1609.05130v233.0612 citations

Originality Incremental advance

AI Analysis

This addresses the need for semantic mapping in robotics to enhance intelligence and user interaction, though it is incremental as it builds on existing SLAM and CNN methods.

The paper tackles the problem of creating dense 3D semantic maps for mobile robots by combining CNNs with a dense SLAM system, showing that fusing multiple predictions improves 2D semantic labeling on the NYUv2 dataset and enables real-time operation at about 25Hz.

Ever more robust, accurate and detailed mapping using visual sensing has proven to be an enabling factor for mobile robots across a wide variety of applications. For the next level of robot intelligence and intuitive user interaction, maps need extend beyond geometry and appearence - they need to contain semantics. We address this challenge by combining Convolutional Neural Networks (CNNs) and a state of the art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondence between frames of indoor RGB-D video even during loopy scanning trajectories. These correspondences allow the CNN's semantic predictions from multiple view points to be probabilistically fused into a map. This not only produces a useful semantic 3D map, but we also show on the NYUv2 dataset that fusing multiple predictions leads to an improvement even in the 2D semantic labelling over baseline single frame predictions. We also show that for a smaller reconstruction dataset with larger variation in prediction viewpoint, the improvement over single frame segmentation increases. Our system is efficient enough to allow real-time interactive use at frame-rates of approximately 25Hz.

View on arXiv PDF

Similar