CVJan 16

Generative Scenario Rollouts for End-to-End Autonomous Driving

arXiv:2601.11475v12 citationsh-index: 19
Originality Highly original
AI Analysis

This work addresses the need for safer and more interpretable end-to-end autonomous driving systems by introducing a novel generative approach, though it builds incrementally on existing VLA models.

The paper tackles the problem of underutilizing Vision-Language-Action (VLA) models as generative models in autonomous driving by proposing Generative Scenario Rollouts (GeRo), a plug-and-play framework that jointly performs planning and generation of language-grounded future traffic scenes through autoregressive rollouts, resulting in improvements of +15.7 in driving score and +26.2 in success rate on Bench2Drive.

Vision-Language-Action (VLA) models are emerging as highly effective planning models for end-to-end autonomous driving systems. However, current works mostly rely on imitation learning from sparse trajectory annotations and under-utilize their potential as generative models. We propose Generative Scenario Rollouts (GeRo), a plug-and-play framework for VLA models that jointly performs planning and generation of language-grounded future traffic scenes through an autoregressive rollout strategy. First, a VLA model is trained to encode ego vehicle and agent dynamics into latent tokens under supervision from planning, motion, and language tasks, facilitating text-aligned generation. Next, GeRo performs language-conditioned autoregressive generation. Given multi-view images, a scenario description, and ego-action questions, it generates future latent tokens and textual responses to guide long-horizon rollouts. A rollout-consistency loss stabilizes predictions using ground truth or pseudo-labels, mitigating drift and preserving text-action alignment. This design enables GeRo to perform temporally consistent, language-grounded rollouts that support long-horizon reasoning and multi-agent planning. On Bench2Drive, GeRo improves driving score and success rate by +15.7 and +26.2, respectively. By integrating reinforcement learning with generative rollouts, GeRo achieves state-of-the-art closed-loop and open-loop performance, demonstrating strong zero-shot robustness. These results highlight the promise of generative, language-conditioned reasoning as a foundation for safer and more interpretable end-to-end autonomous driving.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes