CVMar 25, 2025

SparseGS-W: Sparse-View 3D Gaussian Splatting in the Wild with Generative Priors

arXiv:2503.19452v17 citationsh-index: 2IEEE transactions on circuits and systems for video technology (Print)
Originality Highly original
AI Analysis

This addresses the challenge of 3D scene reconstruction from unconstrained in-the-wild images for computer vision applications, offering a novel method that significantly reduces the required number of training images compared to existing dense-view approaches.

The paper tackles the problem of synthesizing novel views of large-scale outdoor scenes from extremely sparse input images (as few as five), achieving state-of-the-art performance on datasets like PhotoTourism and Tanks and Temples with improvements in metrics such as FID, ClipIQA, and MUSIQ.

Synthesizing novel views of large-scale scenes from unconstrained in-the-wild images is an important but challenging task in computer vision. Existing methods, which optimize per-image appearance and transient occlusion through implicit neural networks from dense training views (approximately 1000 images), struggle to perform effectively under sparse input conditions, resulting in noticeable artifacts. To this end, we propose SparseGS-W, a novel framework based on 3D Gaussian Splatting that enables the reconstruction of complex outdoor scenes and handles occlusions and appearance changes with as few as five training images. We leverage geometric priors and constrained diffusion priors to compensate for the lack of multi-view information from extremely sparse input. Specifically, we propose a plug-and-play Constrained Novel-View Enhancement module to iteratively improve the quality of rendered novel views during the Gaussian optimization process. Furthermore, we propose an Occlusion Handling module, which flexibly removes occlusions utilizing the inherent high-quality inpainting capability of constrained diffusion priors. Both modules are capable of extracting appearance features from any user-provided reference image, enabling flexible modeling of illumination-consistent scenes. Extensive experiments on the PhotoTourism and Tanks and Temples datasets demonstrate that SparseGS-W achieves state-of-the-art performance not only in full-reference metrics, but also in commonly used non-reference metrics such as FID, ClipIQA, and MUSIQ.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes