CVDec 8, 2024

GBR: Generative Bundle Refinement for High-fidelity Gaussian Splatting with Enhanced Mesh Reconstruction

Jianing Zhang, Yuchao Zheng, Ziwei Li, Qionghai Dai, Xiaoyun Yuan

arXiv:2412.05908v22.0h-index: 10IEEE transactions on circuits and systems for video technology (Print)

Originality Incremental advance

AI Analysis

This addresses the challenge of high-fidelity 3D reconstruction for applications like virtual reality or heritage preservation under sparse-view conditions, representing an incremental advancement by combining existing techniques like neural networks and diffusion models.

The paper tackles the problem of sparse-view 3D scene reconstruction and rendering using Gaussian splatting, which struggles with limited input views, by proposing GBR, a method that integrates neural bundle adjustment and generative depth refinement to enhance geometry accuracy and fidelity, achieving significant performance improvements on datasets and enabling detailed reconstruction of large-scale real-world scenes with only 6 views.

Gaussian splatting has gained attention for its efficient representation and rendering of 3D scenes using continuous Gaussian primitives. However, it struggles with sparse-view inputs due to limited geometric and photometric information, causing ambiguities in depth, shape, and texture. we propose GBR: Generative Bundle Refinement, a method for high-fidelity Gaussian splatting and meshing using only 4-6 input views. GBR integrates a neural bundle adjustment module to enhance geometry accuracy and a generative depth refinement module to improve geometry fidelity. More specifically, the neural bundle adjustment module integrates a foundation network to produce initial 3D point maps and point matches from unposed images, followed by bundle adjustment optimization to improve multiview consistency and point cloud accuracy. The generative depth refinement module employs a diffusion-based strategy to enhance geometric details and fidelity while preserving the scale. Finally, for Gaussian splatting optimization, we propose a multimodal loss function incorporating depth and normal consistency, geometric regularization, and pseudo-view supervision, providing robust guidance under sparse-view conditions. Experiments on widely used datasets show that GBR significantly outperforms existing methods under sparse-view inputs. Additionally, GBR demonstrates the ability to reconstruct and render large-scale real-world scenes, such as the Pavilion of Prince Teng and the Great Wall, with remarkable details using only 6 views.

View on arXiv PDF

Similar