CVMar 29, 2024

SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior

arXiv:2403.20079v142 citationsh-index: 4WACV
Originality Incremental advance
AI Analysis

This work addresses a critical bottleneck for autonomous driving simulation by improving view synthesis from sparse training data, representing an incremental advance over existing neural rendering methods.

The paper tackles the problem of maintaining rendering quality at significantly deviated viewpoints in street scene novel view synthesis by enhancing 3D Gaussian Splatting with a diffusion model prior and multi-modal data, achieving state-of-the-art results in broader view rendering.

Novel View Synthesis (NVS) for street scenes play a critical role in the autonomous driving simulation. The current mainstream technique to achieve it is neural rendering, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although thrilling progress has been made, when handling street scenes, current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints. This issue stems from the sparse training views captured by a fixed camera on a moving vehicle. To tackle this problem, we propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model along with complementary multi-modal data. Specifically, we first fine-tune a Diffusion Model by adding images from adjacent frames as condition, meanwhile exploiting depth data from LiDAR point clouds to supply additional spatial information. Then we apply the Diffusion Model to regularize the 3DGS at unseen views during training. Experimental results validate the effectiveness of our method compared with current state-of-the-art models, and demonstrate its advance in rendering images from broader views.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes