Revisiting PatchMatch Multi-View Stereo for Urban 3D Reconstruction
This work addresses 3D reconstruction for urban scenarios, presenting an incremental improvement with a novel loss term and refinement algorithm.
The paper tackles urban 3D reconstruction by proposing a pipeline based on PatchMatch Multi-View Stereo, achieving state-of-the-art performance on the KITTI dataset.
In this paper, a complete pipeline for image-based 3D reconstruction of urban scenarios is proposed, based on PatchMatch Multi-View Stereo (MVS). Input images are firstly fed into an off-the-shelf visual SLAM system to extract camera poses and sparse keypoints, which are used to initialize PatchMatch optimization. Then, pixelwise depths and normals are iteratively computed in a multi-scale framework with a novel depth-normal consistency loss term and a global refinement algorithm to balance the inherently local nature of PatchMatch. Finally, a large-scale point cloud is generated by back-projecting multi-view consistent estimates in 3D. The proposed approach is carefully evaluated against both classical MVS algorithms and monocular depth networks on the KITTI dataset, showing state of the art performances.