CVApr 6, 2019

Visualization of Convolutional Neural Networks for Monocular Depth Estimation

arXiv:1904.03380v192 citations
AI Analysis

This work addresses a fundamental question in computer vision for researchers, but it is incremental as it builds on existing depth estimation methods.

The paper tackles the problem of understanding how convolutional neural networks infer depth from a single image by visualizing relevant pixels, and results show that this approach effectively identifies key pixels for depth estimation across different networks and datasets.

Recently, convolutional neural networks (CNNs) have shown great success on the task of monocular depth estimation. A fundamental yet unanswered question is: how CNNs can infer depth from a single image. Toward answering this question, we consider visualization of inference of a CNN by identifying relevant pixels of an input image to depth estimation. We formulate it as an optimization problem of identifying the smallest number of image pixels from which the CNN can estimate a depth map with the minimum difference from the estimate from the entire image. To cope with a difficulty with optimization through a deep CNN, we propose to use another network to predict those relevant image pixels in a forward computation. In our experiments, we first show the effectiveness of this approach, and then apply it to different depth estimation networks on indoor and outdoor scene datasets. The results provide several findings that help exploration of the above question.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes