MFuseNet: Robust Depth Estimation with Learned Multiscopic Fusion
This addresses depth estimation for robotics or vision applications, but it is incremental as it builds on existing stereo matching with controlled camera motion.
The paper tackles depth estimation by designing a multiscopic vision system using a low-cost monocular camera, which outperforms traditional two-frame stereo matching methods in experiments on the Middlebury dataset and real robot demonstrations.
We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation. Unlike multi-view stereo with images captured at unconstrained camera poses, the proposed system controls the motion of a camera to capture a sequence of images in horizontally or vertically aligned positions with the same parallax. In this system, we propose a new heuristic method and a robust learning-based method to fuse multiple cost volumes between the reference image and its surrounding images. To obtain training data, we build a synthetic dataset with multiscopic images. The experiments on the real-world Middlebury dataset and real robot demonstration show that our multiscopic vision system outperforms traditional two-frame stereo matching methods in depth estimation. Our code and dataset are available at https://sites.google.com/view/multiscopic.