CVSep 18, 2025

Lightweight and Accurate Multi-View Stereo with Confidence-Aware Diffusion Model

arXiv:2509.15220v18 citationsh-index: 11Has CodeIEEE Trans Pattern Anal Mach Intell
Originality Highly original
AI Analysis

This work addresses computational efficiency and accuracy in 3D reconstruction for computer vision applications, representing a novel application of diffusion models rather than an incremental improvement.

The paper tackles 3D geometry reconstruction from calibrated images by introducing diffusion models into multi-view stereo (MVS), proposing two methods: DiffMVS achieves competitive performance with state-of-the-art efficiency in runtime and GPU memory, while CasDiffMVS achieves state-of-the-art performance on DTU, Tanks & Temples, and ETH3D benchmarks.

To reconstruct the 3D geometry from calibrated images, learning-based multi-view stereo (MVS) methods typically perform multi-view depth estimation and then fuse depth maps into a mesh or point cloud. To improve the computational efficiency, many methods initialize a coarse depth map and then gradually refine it in higher resolutions. Recently, diffusion models achieve great success in generation tasks. Starting from a random noise, diffusion models gradually recover the sample with an iterative denoising process. In this paper, we propose a novel MVS framework, which introduces diffusion models in MVS. Specifically, we formulate depth refinement as a conditional diffusion process. Considering the discriminative characteristic of depth estimation, we design a condition encoder to guide the diffusion process. To improve efficiency, we propose a novel diffusion network combining lightweight 2D U-Net and convolutional GRU. Moreover, we propose a novel confidence-based sampling strategy to adaptively sample depth hypotheses based on the confidence estimated by diffusion model. Based on our novel MVS framework, we propose two novel MVS methods, DiffMVS and CasDiffMVS. DiffMVS achieves competitive performance with state-of-the-art efficiency in run-time and GPU memory. CasDiffMVS achieves state-of-the-art performance on DTU, Tanks & Temples and ETH3D. Code is available at: https://github.com/cvg/diffmvs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes