CVRONov 22, 2023

ADriver-I: A General World Model for Autonomous Driving

arXiv:2311.13549v1114 citationsh-index: 27
Originality Incremental advance
AI Analysis

This work addresses the problem of inefficiency in autonomous driving for researchers and developers by proposing a novel integrated approach, though it appears incremental as it builds on existing MLLM and diffusion methods.

The paper tackles the redundancy in modular autonomous driving systems by introducing ADriver-I, a general world model that uses interleaved vision-action pairs and combines multimodal large language models with diffusion techniques to predict control signals and future frames, achieving impressive performance on nuScenes and private datasets.

Typically, autonomous driving adopts a modular design, which divides the full stack into perception, prediction, planning and control parts. Though interpretable, such modular design tends to introduce a substantial amount of redundancy. Recently, multimodal large language models (MLLM) and diffusion techniques have demonstrated their superior performance on comprehension and generation ability. In this paper, we first introduce the concept of interleaved vision-action pair, which unifies the format of visual features and control signals. Based on the vision-action pairs, we construct a general world model based on MLLM and diffusion model for autonomous driving, termed ADriver-I. It takes the vision-action pairs as inputs and autoregressively predicts the control signal of the current frame. The generated control signals together with the historical vision-action pairs are further conditioned to predict the future frames. With the predicted next frame, ADriver-I performs further control signal prediction. Such a process can be repeated infinite times, ADriver-I achieves autonomous driving in the world created by itself. Extensive experiments are conducted on nuScenes and our large-scale private datasets. ADriver-I shows impressive performance compared to several constructed baselines. We hope our ADriver-I can provide some new insights for future autonomous driving and embodied intelligence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes