CVROMar 20, 2018

Fusion of stereo and still monocular depth estimates in a self-supervised learning context

arXiv:1803.07512v124 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more reliable depth maps for autonomous navigation in robots, though it is incremental as it builds on existing self-supervised learning and fusion techniques.

The paper tackles the problem of improving depth estimation for autonomous robots by fusing stereo and monocular depth estimates in a self-supervised learning setup, resulting in fused estimates that outperform stereo vision alone, as demonstrated on the KITTI dataset and a Parrot SLAMDunk robot.

We study how autonomous robots can learn by themselves to improve their depth estimation capability. In particular, we investigate a self-supervised learning setup in which stereo vision depth estimates serve as targets for a convolutional neural network (CNN) that transforms a single still image to a dense depth map. After training, the stereo and mono estimates are fused with a novel fusion method that preserves high confidence stereo estimates, while leveraging the CNN estimates in the low-confidence regions. The main contribution of the article is that it is shown that the fused estimates lead to a higher performance than the stereo vision estimates alone. Experiments are performed on the KITTI dataset, and on board of a Parrot SLAMDunk, showing that even rather limited CNNs can help provide stereo vision equipped robots with more reliable depth maps for autonomous navigation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes