CVAIDec 9, 2024

MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

arXiv:2412.06974v1134 citationsh-index: 8CVPR
Originality Incremental advance
AI Analysis

This addresses the computational inefficiency and error accumulation in multi-view 3D reconstruction for computer vision applications, representing a substantial incremental improvement over existing sparse-view methods.

The paper tackles the problem of sparse multi-view scene reconstruction by proposing MV-DUSt3R+, a single-stage feed-forward network that processes multiple views simultaneously to avoid error-prone pairwise reconstructions and expensive global optimization, achieving significant improvements in reconstruction, pose estimation, and novel view synthesis over prior methods.

Recent sparse multi-view scene reconstruction advances like DUSt3R and MASt3R no longer require camera calibration and camera pose estimation. However, they only process a pair of views at a time to infer pixel-aligned pointmaps. When dealing with more than two views, a combinatorial number of error prone pairwise reconstructions are usually followed by an expensive global optimization, which often fails to rectify the pairwise reconstruction errors. To handle more views, reduce errors, and improve inference time, we propose the fast single-stage feed-forward network MV-DUSt3R. At its core are multi-view decoder blocks which exchange information across any number of views while considering one reference view. To make our method robust to reference view selection, we further propose MV-DUSt3R+, which employs cross-reference-view blocks to fuse information across different reference view choices. To further enable novel view synthesis, we extend both by adding and jointly training Gaussian splatting heads. Experiments on multi-view stereo reconstruction, multi-view pose estimation, and novel view synthesis confirm that our methods improve significantly upon prior art. Code will be released.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes