Learning a Multi-View Stereo Machine
This addresses the problem of efficient and robust 3D reconstruction for computer vision applications, representing a novel method for a known bottleneck rather than a paradigm shift.
The paper tackles 3D reconstruction from images by developing a differentiable multi-view stereo system that incorporates 3D geometry through feature projection/unprojection, enabling reconstruction from fewer images (even one) and surface completion, with evaluation on ShapeNet showing benefits over classical and learning-based methods.
We present a learnt system for multi-view stereopsis. In contrast to recent learning based methods for 3D reconstruction, we leverage the underlying 3D geometry of the problem through feature projection and unprojection along viewing rays. By formulating these operations in a differentiable manner, we are able to learn the system end-to-end for the task of metric 3D reconstruction. End-to-end learning allows us to jointly reason about shape priors while conforming geometric constraints, enabling reconstruction from much fewer images (even a single image) than required by classical approaches as well as completion of unseen surfaces. We thoroughly evaluate our approach on the ShapeNet dataset and demonstrate the benefits over classical approaches as well as recent learning based methods.