Enhancing Multi-view Stereo with Contrastive Matching and Weighted Focal Loss
This work addresses accuracy and completeness issues in 3D reconstruction for computer vision applications, representing an incremental improvement over existing networks.
The paper tackles the problem of improving accuracy and completeness in learning-based multi-view stereo (MVS) by proposing a Contrast Matching Loss and Weighted Focal Loss, achieving state-of-the-art performance on datasets like DTU, Tanks and Temples, and BlendedMVS.
Learning-based multi-view stereo (MVS) methods have made impressive progress and surpassed traditional methods in recent years. However, their accuracy and completeness are still struggling. In this paper, we propose a new method to enhance the performance of existing networks inspired by contrastive learning and feature matching. First, we propose a Contrast Matching Loss (CML), which treats the correct matching points in depth-dimension as positive sample and other points as negative samples, and computes the contrastive loss based on the similarity of features. We further propose a Weighted Focal Loss (WFL) for better classification capability, which weakens the contribution of low-confidence pixels in unimportant areas to the loss according to predicted confidence. Extensive experiments performed on DTU, Tanks and Temples and BlendedMVS datasets show our method achieves state-of-the-art performance and significant improvement over baseline network.