GR CVMar 16, 2025

Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View

Xianzu Wu, Zhenxin Ai, Harry Yang, Ser-Nam Lim, Jun Liu, Huan Wang

arXiv:2503.12553v23 citationsh-index: 9IEEE transactions on circuits and systems for video technology (Print)

Originality Incremental advance

AI Analysis

This work addresses the challenge of reconstructing detailed and consistent 3D outdoor scenes from a single image, which is important for applications in computer vision and graphics, though it appears incremental as it builds on existing monocular depth and normal estimation techniques.

The paper tackles the problem of single-view 3D scene reconstruction, particularly for high-fidelity outdoor scenes, by introducing Niagara, a framework that integrates monocular depth and normal estimation to capture fine details and uses a geometric affine field with 3D self-attention for structural consistency, achieving state-of-the-art results in geometric accuracy and visual fidelity compared to methods like Flash3D.

Recent advances in single-view 3D scene reconstruction have highlighted the challenges in capturing fine geometric details and ensuring structural consistency, particularly in high-fidelity outdoor scene modeling. This paper presents Niagara, a new single-view 3D scene reconstruction framework that can faithfully reconstruct challenging outdoor scenes from a single input image for the first time. Our approach integrates monocular depth and normal estimation as input, which substantially improves its ability to capture fine details, mitigating common issues like geometric detail loss and deformation. Additionally, we introduce a geometric affine field (GAF) and 3D self-attention as geometry-constraint, which combines the structural properties of explicit geometry with the adaptability of implicit feature fields, striking a balance between efficient rendering and high-fidelity reconstruction. Our framework finally proposes a specialized encoder-decoder architecture, where a depth-based 3D Gaussian decoder is proposed to predict 3D Gaussian parameters, which can be used for novel view synthesis. Extensive results and analyses suggest that our Niagara surpasses prior SoTA approaches such as Flash3D in both single-view and dual-view settings, significantly enhancing the geometric accuracy and visual fidelity, especially in outdoor scenes.

View on arXiv PDF

Similar