CVDec 5, 2023

WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation

arXiv:2312.02934v495 citationsh-index: 16Has CodeECCV
Originality Incremental advance
AI Analysis

This addresses the need for diverse and extensive data in autonomous driving, though it appears incremental as it builds on existing diffusion-based methods with a novel world volume integration.

The paper tackles the problem of generating multi-camera street-view videos for autonomous driving datasets by proposing WoVoGen, a diffusion-based method that uses a 4D world volume to ensure intra-world consistency and inter-sensor coherence, resulting in high-quality videos and enabling scene editing tasks.

Generating multi-camera street-view videos is critical for augmenting autonomous driving datasets, addressing the urgent demand for extensive and varied data. Due to the limitations in diversity and challenges in handling lighting conditions, traditional rendering-based methods are increasingly being supplanted by diffusion-based methods. However, a significant challenge in diffusion-based methods is ensuring that the generated sensor data preserve both intra-world consistency and inter-sensor coherence. To address these challenges, we combine an additional explicit world volume and propose the World Volume-aware Multi-camera Driving Scene Generator (WoVoGen). This system is specifically designed to leverage 4D world volume as a foundational element for video generation. Our model operates in two distinct phases: (i) envisioning the future 4D temporal world volume based on vehicle control sequences, and (ii) generating multi-camera videos, informed by this envisioned 4D temporal world volume and sensor interconnectivity. The incorporation of the 4D world volume empowers WoVoGen not only to generate high-quality street-view videos in response to vehicle control inputs but also to facilitate scene editing tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes