CVOct 25, 2021

Multi-scale Iterative Residuals for Fast and Scalable Stereo Matching

arXiv:2110.12769v1
Originality Incremental advance
AI Analysis

This addresses the speed-accuracy trade-off in stereo matching for applications like autonomous driving, though it is incremental as it builds on existing networks.

The paper tackles the accuracy gap between real-time and state-of-the-art stereo matching models by introducing an iterative multi-scale coarse-to-fine refinement framework, resulting in up to 49x faster inference time and 4x less memory consumption with comparable error.

Despite the remarkable progress of deep learning in stereo matching, there exists a gap in accuracy between real-time models and slower state-of-the-art models which are suitable for practical applications. This paper presents an iterative multi-scale coarse-to-fine refinement (iCFR) framework to bridge this gap by allowing it to adopt any stereo matching network to make it fast, more efficient and scalable while keeping comparable accuracy. To reduce the computational cost of matching, we use multi-scale warped features to estimate disparity residuals and push the disparity search range in the cost volume to a minimum limit. Finally, we apply a refinement network to recover the loss of precision which is inherent in multi-scale approaches. We test our iCFR framework by adopting the matching networks from state-of-the art GANet and AANet. The result is 49$\times$ faster inference time compared to GANetdeep and 4$\times$ less memory consumption, with comparable error. Our best performing network, which we call FRSNet is scalable even up to an input resolution of 6K on a GTX 1080Ti, with inference time still below one second and comparable accuracy to AANet+. It out-performs all real-time stereo methods and achieves competitive accuracy on the KITTI benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes