CVJun 8, 2020

Semantics-Driven Unsupervised Learning for Monocular Depth and Ego-Motion Estimation

arXiv:2006.04371v12 citations
AI Analysis

This work addresses depth and motion estimation for autonomous driving or robotics, but it is incremental as it builds on existing unsupervised methods with semantic enhancements.

The paper tackles the problem of monocular depth and ego-motion estimation from videos by proposing an unsupervised learning approach that uses semantic segmentation to mitigate dynamic objects and occlusions, achieving good performance on the KITTI dataset.

We propose a semantics-driven unsupervised learning approach for monocular depth and ego-motion estimation from videos in this paper. Recent unsupervised learning methods employ photometric errors between synthetic view and actual image as a supervision signal for training. In our method, we exploit semantic segmentation information to mitigate the effects of dynamic objects and occlusions in the scene, and to improve depth prediction performance by considering the correlation between depth and semantics. To avoid costly labeling process, we use noisy semantic segmentation results obtained by a pre-trained semantic segmentation network. In addition, we minimize the position error between the corresponding points of adjacent frames to utilize 3D spatial information. Experimental results on the KITTI dataset show that our method achieves good performance in both depth and ego-motion estimation tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes