CVApr 30, 2021

Deep Multi-View Stereo gone wild

arXiv:2104.15119v221 citations
AI Analysis

This addresses the problem of 3D reconstruction from uncontrolled Internet photos for computer vision applications, showing significant differences from controlled scenarios.

The paper investigates whether deep multi-view stereo (MVS) methods, which excel in controlled datasets, perform well on Internet photo collections, finding that unsupervised approaches fail in the wild but can be enabled with specific techniques, while supervised methods achieve state-of-the-art results for few images.

Deep multi-view stereo (MVS) methods have been developed and extensively compared on simple datasets, where they now outperform classical approaches. In this paper, we ask whether the conclusions reached in controlled scenarios are still valid when working with Internet photo collections. We propose a methodology for evaluation and explore the influence of three aspects of deep MVS methods: network architecture, training data, and supervision. We make several key observations, which we extensively validate quantitatively and qualitatively, both for depth prediction and complete 3D reconstructions. First, complex unsupervised approaches cannot train on data in the wild. Our new approach makes it possible with three key elements: upsampling the output, softmin based aggregation and a single reconstruction loss. Second, supervised deep depthmap-based MVS methods are state-of-the art for reconstruction of few internet images. Finally, our evaluation provides very different results than usual ones. This shows that evaluation in uncontrolled scenarios is important for new architectures.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes