CVJun 9, 2014

Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

arXiv:1406.2283v14681 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of 3D scene understanding from monocular images, which is incremental as it builds on existing deep learning approaches.

The paper tackles the problem of predicting depth from a single image by using a multi-scale deep network with two stacks for coarse global and fine local predictions, achieving state-of-the-art results on NYU Depth and KITTI datasets.

Predicting depth is an essential component in understanding the 3D geometry of a scene. While for stereo images local correspondence suffices for estimation, finding depth relations from a single image is less straightforward, requiring integration of both global and local information from various cues. Moreover, the task is inherently ambiguous, with a large source of uncertainty coming from the overall scale. In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. We also apply a scale-invariant error to help measure depth relations rather than scale. By leveraging the raw datasets as large sources of training data, our method achieves state-of-the-art results on both NYU Depth and KITTI, and matches detailed depth boundaries without the need for superpixelation.

Code Implementations10 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes