CVSep 4, 2017

Self-Supervised Learning for Stereo Matching with Self-Improving Ability

arXiv:1709.00930v1174 citations
Originality Incremental advance
AI Analysis

This addresses the lack of ground-truth data in stereo matching for applications like robotics and autonomous driving, though it is incremental as it builds on known warping concepts.

The paper tackles the problem of dense stereo matching without ground-truth disparity maps by proposing a self-supervised convolutional neural network that uses image warping error as the loss function. It outperforms state-of-the-art methods on KITTI and Middlebury benchmarks and is significantly faster.

Exiting deep-learning based dense stereo matching methods often rely on ground-truth disparity maps as the training signals, which are however not always available in many situations. In this paper, we design a simple convolutional neural network architecture that is able to learn to compute dense disparity maps directly from the stereo inputs. Training is performed in an end-to-end fashion without the need of ground-truth disparity maps. The idea is to use image warping error (instead of disparity-map residuals) as the loss function to drive the learning process, aiming to find a depth-map that minimizes the warping error. While this is a simple concept well-known in stereo matching, to make it work in a deep-learning framework, many non-trivial challenges must be overcome, and in this work we provide effective solutions. Our network is self-adaptive to different unseen imageries as well as to different camera settings. Experiments on KITTI and Middlebury stereo benchmark datasets show that our method outperforms many state-of-the-art stereo matching methods with a margin, and at the same time significantly faster.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes