Multi-Scale Cost Volumes Cascade Network for Stereo Matching
This work provides a faster and more accurate stereo matching solution for robot navigation, which is an incremental improvement for the robotics community.
This paper addresses the trade-off between speed and accuracy in stereo matching for robot navigation. The proposed MSCVNet, a hybrid approach combining traditional methods and neural networks, achieves significantly faster processing (24x faster than CSPN, 44x faster than GANet) while also improving accuracy compared to traditional and real-time stereo matching networks.
Stereo matching is essential for robot navigation. However, the accuracy of current widely used traditional methods is low, while methods based on CNN need expensive computational cost and running time. This is because different cost volumes play a crucial role in balancing speed and accuracy. Thus we propose MSCVNet, which combines traditional methods and neural networks to improve the quality of cost volume. Concretely, our network first generates multiple 3D cost volumes with different resolutions and then uses 2D convolutions to construct a novel cascade hourglass network for cost aggregation. Meanwhile, we design an algorithm to distinguish and calculate the loss for discontinuous areas of disparity result. According to the KITTI official website, our network is much faster than most top-performing methods (24 times than CSPN, 44 times than GANet, etc.). Meanwhile, compared to traditional methods (SPS-St, SGM) and other real-time stereo matching networks (Fast DS-CS, DispNetC, and RTSNet, etc.), our network achieves a big improvement in accuracy, demonstrating the feasibility and capability of the proposed method.