CVAIMay 23, 2024

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

arXiv:2405.14475v472 citationsh-index: 14
Originality Highly original
AI Analysis

This addresses the need for flexible 3D scene generation in unbounded autonomous driving scenarios, offering a novel approach that reduces data acquisition challenges compared to existing methods.

The paper tackles the problem of controllable 3D street scene generation for autonomous driving by introducing MagicDrive3D, a framework that combines video-based view synthesis with 3D Gaussian Splatting generation, resulting in diverse, high-quality 3D scenes that support any-view rendering and enhance downstream tasks like BEV segmentation.

Controllable generative models for images and videos have seen significant success, yet 3D scene generation, especially in unbounded scenarios like autonomous driving, remains underdeveloped. Existing methods lack flexible controllability and often rely on dense view data collection in controlled environments, limiting their generalizability across common datasets (e.g., nuScenes). In this paper, we introduce MagicDrive3D, a novel framework for controllable 3D street scene generation that combines video-based view synthesis with 3D representation (3DGS) generation. It supports multi-condition control, including road maps, 3D objects, and text descriptions. Unlike previous approaches that require 3D representation before training, MagicDrive3D first trains a multi-view video generation model to synthesize diverse street views. This method utilizes routinely collected autonomous driving data, reducing data acquisition challenges and enriching 3D scene generation. In the 3DGS generation step, we introduce Fault-Tolerant Gaussian Splatting to address minor errors and use monocular depth for better initialization, alongside appearance modeling to manage exposure discrepancies across viewpoints. Experiments show that MagicDrive3D generates diverse, high-quality 3D driving scenes, supports any-view rendering, and enhances downstream tasks like BEV segmentation, demonstrating its potential for autonomous driving simulation and beyond.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes