CVNov 17, 2016

End-to-end Learning of Cost-Volume Aggregation for Real-time Dense Stereo

arXiv:1611.05689v125 citations
Originality Incremental advance
AI Analysis

This work addresses real-time dense stereo matching for applications like autonomous driving, offering a balance between speed and accuracy, though it is incremental by building on existing cost-volume methods.

The paper tackles dense stereo matching by using a deep convolutional network to predict local parameters for cost-volume aggregation, enabling end-to-end training and achieving 6.34% error rate on KITTI 2015 at 29 fps.

We present a new deep learning-based approach for dense stereo matching. Compared to previous works, our approach does not use deep learning of pixel appearance descriptors, employing very fast classical matching scores instead. At the same time, our approach uses a deep convolutional network to predict the local parameters of cost volume aggregation process, which in this paper we implement using differentiable domain transform. By treating such transform as a recurrent neural network, we are able to train our whole system that includes cost volume computation, cost-volume aggregation (smoothing), and winner-takes-all disparity selection end-to-end. The resulting method is highly efficient at test time, while achieving good matching accuracy. On the KITTI 2015 benchmark, it achieves a result of 6.34\% error rate while running at 29 frames per second rate on a modern GPU.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes