CVJun 9, 2014

Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

David Eigen, Christian Puhrsch, Rob Fergus

arXiv:1406.2283v14681 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of 3D scene understanding from monocular images, which is incremental as it builds on existing deep learning approaches.

The paper tackles the problem of predicting depth from a single image by using a multi-scale deep network with two stacks for coarse global and fine local predictions, achieving state-of-the-art results on NYU Depth and KITTI datasets.

Predicting depth is an essential component in understanding the 3D geometry of a scene. While for stereo images local correspondence suffices for estimation, finding depth relations from a single image is less straightforward, requiring integration of both global and local information from various cues. Moreover, the task is inherently ambiguous, with a large source of uncertainty coming from the overall scale. In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. We also apply a scale-invariant error to help measure depth relations rather than scale. By leveraging the raw datasets as large sources of training data, our method achieves state-of-the-art results on both NYU Depth and KITTI, and matches detailed depth boundaries without the need for superpixelation.

View on arXiv PDF

Similar