A-TVSNet: Aggregated Two-View Stereo Network for Multi-View Stereo Depth Estimation
This work addresses the problem of accurate 3D reconstruction for computer vision applications, representing an incremental improvement with a novel refinement and aggregation method.
The paper tackles depth map estimation from multi-view stereo images by proposing A-TVSNet, which includes a base network, a refinement network using photometric and geometric information, and an attentional multi-view aggregation framework, resulting in high-quality depth maps that outperform competing approaches on various datasets.
We propose a learning-based network for depth map estimation from multi-view stereo (MVS) images. Our proposed network consists of three sub-networks: 1) a base network for initial depth map estimation from an unstructured stereo image pair, 2) a novel refinement network that leverages both photometric and geometric information, and 3) an attentional multi-view aggregation framework that enables efficient information exchange and integration among different stereo image pairs. The proposed network, called A-TVSNet, is evaluated on various MVS datasets and shows the ability to produce high quality depth map that outperforms competing approaches. Our code is available at https://github.com/daiszh/A-TVSNet.