CVSep 3, 2018

Detail Preserving Depth Estimation from a Single Image Using Attention Guided Networks

arXiv:1809.00646v198 citations
Originality Incremental advance
AI Analysis

This addresses the need for high-quality, detailed depth maps in applications like robotics or AR/VR, though it appears incremental as it builds on existing CNN approaches.

The paper tackles the problem of blurred depth maps in single image depth estimation by proposing a network with a Dense Feature Extractor and Depth Map Generator using attention mechanisms, achieving competitive state-of-the-art results while preserving better structural details and running at about 15 fps.

Convolutional Neural Networks have demonstrated superior performance on single image depth estimation in recent years. These works usually use stacked spatial pooling or strided convolution to get high-level information which are common practices in classification task. However, depth estimation is a dense prediction problem and low-resolution feature maps usually generate blurred depth map which is undesirable in application. In order to produce high quality depth map, say clean and accurate, we propose a network consists of a Dense Feature Extractor (DFE) and a Depth Map Generator (DMG). The DFE combines ResNet and dilated convolutions. It extracts multi-scale information from input image while keeping the feature maps dense. As for DMG, we use attention mechanism to fuse multi-scale features produced in DFE. Our Network is trained end-to-end and does not need any post-processing. Hence, it runs fast and can predict depth map in about 15 fps. Experiment results show that our method is competitive with the state-of-the-art in quantitative evaluation, but can preserve better structural details of the scene depth.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes