SCV-Stereo: Learning Stereo Matching from a Sparse Cost Volume
This work addresses a computational bottleneck in stereo matching for computer vision applications, offering a more efficient method with competitive performance.
The paper tackles the computational inefficiency of dense cost volumes in CNN-based stereo matching by proposing SCV-Stereo, which uses sparse cost volume representations to enable iterative disparity updates, achieving significant improvements in balancing accuracy and efficiency on the KITTI Stereo benchmarks.
Convolutional neural network (CNN)-based stereo matching approaches generally require a dense cost volume (DCV) for disparity estimation. However, generating such cost volumes is computationally-intensive and memory-consuming, hindering CNN training and inference efficiency. To address this problem, we propose SCV-Stereo, a novel CNN architecture, capable of learning dense stereo matching from sparse cost volume (SCV) representations. Our inspiration is derived from the fact that DCV representations are somewhat redundant and can be replaced with SCV representations. Benefiting from these SCV representations, our SCV-Stereo can update disparity estimations in an iterative fashion for accurate and efficient stereo matching. Extensive experiments carried out on the KITTI Stereo benchmarks demonstrate that our SCV-Stereo can significantly minimize the trade-off between accuracy and efficiency for stereo matching. Our project page is https://sites.google.com/view/scv-stereo.