BP-MVSNet: Belief-Propagation-Layers for Multi-View-Stereo
This work improves depth map quality for 3D reconstruction tasks, but it is incremental as it builds on existing MVS methods with novel regularization extensions.
The authors tackled the problem of multi-view stereo depth estimation by integrating a differentiable Conditional Random Field layer with belief propagation into a CNN, achieving state-of-the-art results on benchmarks like DTU, Tanks and Temples, and ETH3D.
In this work, we propose BP-MVSNet, a convolutional neural network (CNN)-based Multi-View-Stereo (MVS) method that uses a differentiable Conditional Random Field (CRF) layer for regularization. To this end, we propose to extend the BP layer and add what is necessary to successfully use it in the MVS setting. We therefore show how we can calculate a normalization based on the expected 3D error, which we can then use to normalize the label jumps in the CRF. This is required to make the BP layer invariant to different scales in the MVS setting. In order to also enable fractional label jumps, we propose a differentiable interpolation step, which we embed into the computation of the pairwise term. These extensions allow us to integrate the BP layer into a multi-scale MVS network, where we continuously improve a rough initial estimate until we get high quality depth maps as a result. We evaluate the proposed BP-MVSNet in an ablation study and conduct extensive experiments on the DTU, Tanks and Temples and ETH3D data sets. The experiments show that we can significantly outperform the baseline and achieve state-of-the-art results.