CVApr 1, 2025

ADGaussian: Generalizable Gaussian Splatting for Autonomous Driving with Multi-modal Inputs

arXiv:2504.00437v12 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses the problem of high-quality rendering for autonomous driving systems, but it is incremental as it builds on prior Gaussian Splatting methods by adding multi-modal inputs.

The paper tackles street scene reconstruction from single-view input by proposing ADGaussian, a method that uses joint optimization of image and depth features with sparse LiDAR depth, achieving state-of-the-art performance and superior zero-shot generalization on Waymo and KITTI datasets.

We present a novel approach, termed ADGaussian, for generalizable street scene reconstruction. The proposed method enables high-quality rendering from single-view input. Unlike prior Gaussian Splatting methods that primarily focus on geometry refinement, we emphasize the importance of joint optimization of image and depth features for accurate Gaussian prediction. To this end, we first incorporate sparse LiDAR depth as an additional input modality, formulating the Gaussian prediction process as a joint learning framework of visual information and geometric clue. Furthermore, we propose a multi-modal feature matching strategy coupled with a multi-scale Gaussian decoding model to enhance the joint refinement of multi-modal features, thereby enabling efficient multi-modal Gaussian learning. Extensive experiments on two large-scale autonomous driving datasets, Waymo and KITTI, demonstrate that our ADGaussian achieves state-of-the-art performance and exhibits superior zero-shot generalization capabilities in novel-view shifting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes