CVSep 10, 2024

MyGo: Consistent and Controllable Multi-View Driving Video Generation with Camera Control

arXiv:2409.06189v23 citationsh-index: 5
AI Analysis

This work addresses the need for high-quality, controllable training data for autonomous driving models, representing an incremental improvement in multi-view consistency and camera controllability.

The paper tackled the problem of generating multi-view driving videos with controllable camera motion, proposing MyGo, an end-to-end framework that achieved state-of-the-art results in camera-controlled video generation tasks.

High-quality driving video generation is crucial for providing training data for autonomous driving models. However, current generative models rarely focus on enhancing camera motion control under multi-view tasks, which is essential for driving video generation. Therefore, we propose MyGo, an end-to-end framework for video generation, introducing motion of onboard cameras as conditions to make progress in camera controllability and multi-view consistency. MyGo employs additional plug-in modules to inject camera parameters into the pre-trained video diffusion model, which retains the extensive knowledge of the pre-trained model as much as possible. Furthermore, we use epipolar constraints and neighbor view information during the generation process of each view to enhance spatial-temporal consistency. Experimental results show that MyGo has achieved state-of-the-art results in both general camera-controlled video generation and multi-view driving video generation tasks, which lays the foundation for more accurate environment simulation in autonomous driving. Project page: https://metadrivescape.github.io/papers_project/MyGo/page.html

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes