Rethinking the Multi-view Stereo from the Perspective of Rendering-based Augmentation
This addresses reconstruction problems in large-scale 3D scenes for the MVS community, but is incremental as it builds on existing methods.
The paper tackles challenges in large-scale Multi-View Stereo (MVS) on GigaMVS by combining learning-based and traditional methods, using rendered images to fine-tune MVSFormer, achieving Top-1 ranking on GigaReconstruction.
GigaMVS presents several challenges to existing Multi-View Stereo (MVS) algorithms for its large scale, complex occlusions, and gigapixel images. To address these problems, we first apply one of the state-of-the-art learning-based MVS methods, --MVSFormer, to overcome intractable scenarios such as textureless and reflections regions suffered by traditional PatchMatch methods, but it fails in a few large scenes' reconstructions. Moreover, traditional PatchMatch algorithms such as ACMMP, OpenMVS, and RealityCapture are leveraged to further improve the completeness in large scenes. Furthermore, to unify both advantages of deep learning methods and the traditional PatchMatch, we propose to render depth and color images to further fine-tune the MVSFormer model. Notably, we find that the MVS method could produce much better predictions through rendered images due to the coincident illumination, which we believe is significant for the MVS community. Thus, MVSFormer is capable of generalizing to large-scale scenes and complementarily solves the textureless reconstruction problem. Finally, we have assembled all point clouds mentioned above \textit{except ones from RealityCapture} and ranked Top-1 on the competitive GigaReconstruction.