TriStereoNet: A Trinocular Framework for Multi-baseline Disparity Estimation
This work addresses depth estimation challenges in autonomous driving by improving accuracy with multi-baseline data, though it is incremental as it builds on existing stereo methods.
The paper tackles the problem of limited input data in stereo vision for depth estimation by introducing TriStereoNet, an end-to-end network for trinocular setups combining narrow and wide stereo pairs, which surpasses individual pair architectures in disparity estimation.
Stereo vision is an effective technique for depth estimation with broad applicability in autonomous urban and highway driving. While various deep learning-based approaches have been developed for stereo, the input data from a binocular setup with a fixed baseline are limited. Addressing such a problem, we present an end-to-end network for processing the data from a trinocular setup, which is a combination of a narrow and a wide stereo pair. In this design, two pairs of binocular data with a common reference image are treated with shared weights of the network and a mid-level fusion. We also propose a Guided Addition method for merging the 4D data of the two baselines. Additionally, an iterative sequential self-supervised and supervised learning on real and synthetic datasets is presented, making the training of the trinocular system practical with no need to ground-truth data of the real dataset. Experimental results demonstrate that the trinocular disparity network surpasses the scenario where individual pairs are fed into a similar architecture. Code and dataset: https://github.com/cogsys-tuebingen/tristereonet.