GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving
This addresses the need for scalable and controllable simulation tools in autonomous driving development, though it is incremental as it builds on existing generative models.
The paper tackled the problem of generating realistic multi-camera driving videos for autonomous driving simulation by introducing GAIA-2, a latent diffusion world model that produces high-resolution, consistent videos across diverse environments like the UK, US, and Germany.
Generative models offer a scalable and flexible paradigm for simulating complex environments, yet current approaches fall short in addressing the domain-specific requirements of autonomous driving - such as multi-agent interactions, fine-grained control, and multi-camera consistency. We introduce GAIA-2, Generative AI for Autonomy, a latent diffusion world model that unifies these capabilities within a single generative framework. GAIA-2 supports controllable video generation conditioned on a rich set of structured inputs: ego-vehicle dynamics, agent configurations, environmental factors, and road semantics. It generates high-resolution, spatiotemporally consistent multi-camera videos across geographically diverse driving environments (UK, US, Germany). The model integrates both structured conditioning and external latent embeddings (e.g., from a proprietary driving model) to facilitate flexible and semantically grounded scene synthesis. Through this integration, GAIA-2 enables scalable simulation of both common and rare driving scenarios, advancing the use of generative world models as a core tool in the development of autonomous systems. Videos are available at https://wayve.ai/thinking/gaia-2.