Flow2Stereo: Effective Self-Supervised Learning of Optical Flow and Stereo Matching
This addresses the problem of reducing annotation costs for computer vision tasks like optical flow and stereo matching, offering a self-supervised approach that is not incremental but sets new state-of-the-art benchmarks.
The paper tackles the joint learning of optical flow and stereo matching by modeling stereo as a special case of optical flow and using 3D geometry from stereoscopic videos, resulting in a single model that achieves the highest accuracy among unsupervised methods on KITTI benchmarks and even outperforms some fully supervised methods.
In this paper, we propose a unified method to jointly learn optical flow and stereo matching. Our first intuition is stereo matching can be modeled as a special case of optical flow, and we can leverage 3D geometry behind stereoscopic videos to guide the learning of these two forms of correspondences. We then enroll this knowledge into the state-of-the-art self-supervised learning framework, and train one single network to estimate both flow and stereo. Second, we unveil the bottlenecks in prior self-supervised learning approaches, and propose to create a new set of challenging proxy tasks to boost performance. These two insights yield a single model that achieves the highest accuracy among all existing unsupervised flow and stereo methods on KITTI 2012 and 2015 benchmarks. More remarkably, our self-supervised method even outperforms several state-of-the-art fully supervised methods, including PWC-Net and FlowNet2 on KITTI 2012.