RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching
This addresses the problem of accurate and efficient stereo matching for computer vision applications, representing a strong specific gain rather than an incremental improvement.
The paper tackles stereo matching by introducing RAFT-Stereo, a deep architecture based on RAFT, which ranks first on the Middlebury leaderboard with a 29% improvement in 1px error over the next best method and outperforms all published work on the ETH3D benchmark.
We introduce RAFT-Stereo, a new deep architecture for rectified stereo based on the optical flow network RAFT. We introduce multi-level convolutional GRUs, which more efficiently propagate information across the image. A modified version of RAFT-Stereo can perform accurate real-time inference. RAFT-stereo ranks first on the Middlebury leaderboard, outperforming the next best method on 1px error by 29% and outperforms all published work on the ETH3D two-view stereo benchmark. Code is available at https://github.com/princeton-vl/RAFT-Stereo.