CVMay 12

GeoQuery: Geometry-Query Diffusion for Sparse-View Reconstruction

arXiv:2605.1239920.2
Predicted impact top 42% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For 3D reconstruction and novel view synthesis, GeoQuery addresses the failure of multi-view self-attention in diffusion models when rendered views are heavily corrupted, improving robustness under sparse-view constraints.

GeoQuery introduces a geometry-guided diffusion framework that uses depth maps and camera poses to construct geometry-aligned proxy queries, replacing corrupted rendering features in cross-view attention, enabling robust 3D reconstruction under extreme view sparsity.

3D Gaussian Splatting (3DGS) has emerged as a prominent paradigm for 3D reconstruction and novel view synthesis. However, it remains vulnerable to severe artifacts when trained under sparse-view constraints. While recent methods attempt to rectify artifacts in rendered views using image diffusion models, they typically rely on multi-view self-attention to retrieve information from reference images. We observe that this mechanism often fails when the rendered novel views output by 3DGS are heavily corrupted: damaged query features lead to erroneous cross-view retrieval, resulting in inconsistent rendering refinement. To address this, we propose GeoQuery, a geometry-guided diffusion framework that integrates generative priors with explicit geometric cues via a novel Geometry-guided Cross-view Attention (GCA) mechanism. First, by leveraging predicted depth maps and camera poses, we construct a geometry-induced correspondence field to sample reference features, forming a geometry-aligned proxy query that replaces the corrupted rendering features. Furthermore, we design a new cross-view feature aggregation pipeline, in which we restrict the cross-view attention to a local window around each proxy query to effectively retrieve useful features while suppressing spurious matches. GeoQuery can be seamlessly integrated into existing diffusion-based pipelines, enabling robust reconstruction even under extreme view sparsity. Extensive experiments on sparse-view novel view synthesis and rendering artifact removal demonstrate the effectiveness of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes