CVJul 24, 2018

StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction

arXiv:1807.08865v1386 citations
Originality Highly original
AI Analysis

This addresses the problem of efficient and accurate depth prediction for applications like robotics and autonomous driving, representing a novel method for a known bottleneck.

The paper tackles real-time stereo matching by introducing StereoNet, an end-to-end deep architecture that runs at 60 fps on an NVidia Titan X and produces high-quality, edge-preserved disparity maps with sub-pixel matching precision an order of magnitude higher than traditional approaches.

This paper presents StereoNet, the first end-to-end deep architecture for real-time stereo matching that runs at 60 fps on an NVidia Titan X, producing high-quality, edge-preserved, quantization-free disparity maps. A key insight of this paper is that the network achieves a sub-pixel matching precision than is a magnitude higher than those of traditional stereo matching approaches. This allows us to achieve real-time performance by using a very low resolution cost volume that encodes all the information needed to achieve high disparity precision. Spatial precision is achieved by employing a learned edge-aware upsampling function. Our model uses a Siamese network to extract features from the left and right image. A first estimate of the disparity is computed in a very low resolution cost volume, then hierarchically the model re-introduces high-frequency details through a learned upsampling function that uses compact pixel-to-pixel refinement networks. Leveraging color input as a guide, this function is capable of producing high-quality edge-aware output. We achieve compelling results on multiple benchmarks, showing how the proposed method offers extreme flexibility at an acceptable computational budget.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes