CVROMar 30, 2023

NeRF-Supervised Deep Stereo

arXiv:2303.17603v171 citationsh-index: 37
Originality Highly original
AI Analysis

This provides a more accessible and effective self-supervised training method for stereo vision, addressing the need for ground-truth data in applications like robotics and 3D reconstruction.

The paper tackles the problem of training deep stereo networks without ground-truth data by using neural rendering to generate training data from single-camera sequences, resulting in models that achieve a 30-40% improvement over existing self-supervised methods on the Middlebury dataset and often outperform supervised models in zero-shot generalization.

We introduce a novel framework for training deep stereo networks effortlessly and without any ground-truth. By leveraging state-of-the-art neural rendering solutions, we generate stereo training data from image sequences collected with a single handheld camera. On top of them, a NeRF-supervised training procedure is carried out, from which we exploit rendered stereo triplets to compensate for occlusions and depth maps as proxy labels. This results in stereo networks capable of predicting sharp and detailed disparity maps. Experimental results show that models trained under this regime yield a 30-40% improvement over existing self-supervised methods on the challenging Middlebury dataset, filling the gap to supervised models and, most times, outperforming them at zero-shot generalization.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes