CVAug 28, 2017

A Compromise Principle in Deep Monocular Depth Estimation

arXiv:1708.08267v212 citations
AI Analysis

This work addresses a fundamental challenge in 3D scene understanding for applications like robotics and autonomous driving, though it is incremental as it builds on existing deep learning methods.

The paper tackles the problem of poor local solutions in training deep networks for monocular depth estimation by proposing a compromise principle between spatial and depth resolutions, resulting in a regression-classification cascaded network that achieves state-of-the-art results on NYU Depth V2, KITTI, and Make3D benchmarks.

Monocular depth estimation, which plays a key role in understanding 3D scene geometry, is fundamentally an ill-posed problem. Existing methods based on deep convolutional neural networks (DCNNs) have examined this problem by learning convolutional networks to estimate continuous depth maps from monocular images. However, we find that training a network to predict a high spatial resolution continuous depth map often suffers from poor local solutions. In this paper, we hypothesize that achieving a compromise between spatial and depth resolutions can improve network training. Based on this "compromise principle", we propose a regression-classification cascaded network (RCCN), which consists of a regression branch predicting a low spatial resolution continuous depth map and a classification branch predicting a high spatial resolution discrete depth map. The two branches form a cascaded structure allowing the classification and regression branches to benefit from each other. By leveraging large-scale raw training datasets and some data augmentation strategies, our network achieves top or state-of-the-art results on the NYU Depth V2, KITTI, and Make3D benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes