CVNov 24, 2024

Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors

Soumava Paul, Prakhar Kaushik, Alan Yuille

arXiv:2411.15966v312.88 citationsh-index: 12Trans. Mach. Learn. Res.

Originality Incremental advance

AI Analysis

This addresses the problem of reconstructing complex 360 scenes without known camera poses for applications in computer vision and graphics, representing an incremental advance over existing pose-free techniques.

The paper tackles pose-free 3D scene reconstruction from sparse 2D images by proposing a generative model that inpaints missing details and removes artifacts in novel views and depth maps, achieving competitive performance with state-of-the-art posed methods on benchmarks like MipNeRF360 and DL3DV-10K.

In this work, we introduce a generative approach for pose-free (without camera parameters) reconstruction of 360 scenes from a sparse set of 2D images. Pose-free scene reconstruction from incomplete, pose-free observations is usually regularized with depth estimation or 3D foundational priors. While recent advances have enabled sparse-view reconstruction of large complex scenes (with high degree of foreground and background detail) with known camera poses using view-conditioned generative priors, these methods cannot be directly adapted for the pose-free setting when ground-truth poses are not available during evaluation. To address this, we propose an image-to-image generative model designed to inpaint missing details and remove artifacts in novel view renders and depth maps of a 3D scene. We introduce context and geometry conditioning using Feature-wise Linear Modulation (FiLM) modulation layers as a lightweight alternative to cross-attention and also propose a novel confidence measure for 3D Gaussian splat representations to allow for better detection of these artifacts. By progressively integrating these novel views in a Gaussian-SLAM-inspired process, we achieve a multi-view-consistent 3D representation. Evaluations on the MipNeRF360 and DL3DV-10K benchmark dataset demonstrate that our method surpasses existing pose-free techniques and performs competitively with state-of-the-art posed (precomputed camera parameters are given) reconstruction methods in complex 360 scenes. Our project page provides additional results, videos, and code.

View on arXiv PDF

Similar