CVSep 19, 2024

DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input

Qijian Tian, Xin Tan, Yuan Xie, Lizhuang Ma

arXiv:2409.12753v225.949 citationsh-index: 18Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of real-time 3D reconstruction for autonomous driving systems, though it appears incremental as it builds on Gaussian Splatting with new components for pose and depth estimation.

The authors tackled the problem of reconstructing driving scenes from sparse, limited-overlap surround-view images with unknown camera extrinsics, achieving real-time feed-forward reconstruction that outperforms state-of-the-art methods on the nuScenes dataset.

We propose DrivingForward, a feed-forward Gaussian Splatting model that reconstructs driving scenes from flexible surround-view input. Driving scene images from vehicle-mounted cameras are typically sparse, with limited overlap, and the movement of the vehicle further complicates the acquisition of camera extrinsics. To tackle these challenges and achieve real-time reconstruction, we jointly train a pose network, a depth network, and a Gaussian network to predict the Gaussian primitives that represent the driving scenes. The pose network and depth network determine the position of the Gaussian primitives in a self-supervised manner, without using depth ground truth and camera extrinsics during training. The Gaussian network independently predicts primitive parameters from each input image, including covariance, opacity, and spherical harmonics coefficients. At the inference stage, our model can achieve feed-forward reconstruction from flexible multi-frame surround-view input. Experiments on the nuScenes dataset show that our model outperforms existing state-of-the-art feed-forward and scene-optimized reconstruction methods in terms of reconstruction.

View on arXiv PDF Code

Similar