CVFeb 9, 2017

Semi-Supervised Deep Learning for Monocular Depth Map Prediction

arXiv:1702.02706v3692 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of limited training data for depth prediction in dynamic outdoor environments, offering a practical solution for applications like autonomous driving or robotics.

The paper tackles the problem of monocular depth map prediction by proposing a semi-supervised deep learning approach that combines sparse ground-truth depth with a photoconsistency loss in a stereo setup, achieving superior performance compared to state-of-the-art methods.

Supervised deep learning often suffers from the lack of sufficient training data. Specifically in the context of monocular depth map prediction, it is barely possible to determine dense ground truth depth images in realistic dynamic outdoor environments. When using LiDAR sensors, for instance, noise is present in the distance measurements, the calibration between sensors cannot be perfect, and the measurements are typically much sparser than the camera images. In this paper, we propose a novel approach to depth map prediction from monocular images that learns in a semi-supervised way. While we use sparse ground-truth depth for supervised learning, we also enforce our deep network to produce photoconsistent dense depth maps in a stereo setup using a direct image alignment loss. In experiments we demonstrate superior performance in depth map prediction from single images compared to the state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes