CVMay 23, 2016

Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions

arXiv:1605.07081v2119 citations
Originality Incremental advance
AI Analysis

This work addresses depth estimation for computer vision applications, presenting an incremental improvement by integrating local predictions with a globalization procedure.

The paper tackles monocular depth estimation from a single image by using a neural network to predict probability distributions of local depth derivatives, then harmonizing these predictions into a consistent depth map. The method is evaluated on the NYU v2 dataset, showing efficacy in this task.

A single color image can contain many cues informative towards different aspects of local geometric structure. We approach the problem of monocular depth estimation by using a neural network to produce a mid-level representation that summarizes these cues. This network is trained to characterize local scene geometry by predicting, at every image location, depth derivatives of different orders, orientations and scales. However, instead of a single estimate for each derivative, the network outputs probability distributions that allow it to express confidence about some coefficients, and ambiguity about others. Scene depth is then estimated by harmonizing this overcomplete set of network predictions, using a globalization procedure that finds a single consistent depth map that best matches all the local derivative distributions. We demonstrate the efficacy of this approach through evaluation on the NYU v2 depth data set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes