CVMar 26, 2018

Cascaded multi-scale and multi-dimension convolutional neural network for stereo matching

arXiv:1803.09437v218 citations
Originality Incremental advance
AI Analysis

This work addresses stereo estimation for computer vision applications, presenting an incremental improvement by integrating cost computation and aggregation in a novel network architecture.

The paper tackles stereo matching by proposing a cascaded CNN that combines multi-scale cost computation with multi-dimensional aggregation, achieving competitive results on the KITTI benchmark without post-processing.

Convolutional neural networks(CNN) have been shown to perform better than the conventional stereo algorithms for stereo estimation. Numerous efforts focus on the pixel-wise matching cost computation, which is the important building block for many start-of-the-art algorithms. However, those architectures are limited to small and single scale receptive fields and use traditional methods for cost aggregation or even ignore cost aggregation. Differently we take them both into consideration. Firstly, we propose a new multi-scale matching cost computation sub-network, in which two different sizes of receptive fields are implemented parallelly. In this way, the network can make the best use of both variants and balance the trade-off between the increase of receptive field and the loss of detail. Furthermore, we show that our multi-dimension aggregation sub-network which containing 2D convolution and 3D convolution operations can provide rich context and semantic information for estimating an accurate initial disparity. Finally, experiments on challenging stereo benchmark KITTI demonstrate that the proposed method can achieve competitive results even without any additional post-processing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes