CVFeb 24

An interactive enhanced driving dataset for autonomous driving

arXiv:2602.20575v1h-index: 9
Originality Synthesis-oriented
AI Analysis

This dataset addresses the need for better interactive capabilities in autonomous driving models, though it is incremental as it builds on existing data collection methods.

The paper tackles the lack of interactive scenarios and multimodal alignment in autonomous driving data by proposing the Interactive Enhanced Driving Dataset (IEDD), which includes million-level interactive segments and a VQA dataset with aligned semantic actions, and provides benchmark results for ten VLMs to demonstrate its reuse value.

The evolution of autonomous driving towards full automation demands robust interactive capabilities; however, the development of Vision-Language-Action (VLA) models is constrained by the sparsity of interactive scenarios and inadequate multimodal alignment in existing data. To this end, this paper proposes the Interactive Enhanced Driving Dataset (IEDD). We develop a scalable pipeline to mine million-level interactive segments from naturalistic driving data based on interactive trajectories, and design metrics to quantify the interaction processes. Furthermore, the IEDD-VQA dataset is constructed by generating synthetic Bird's Eye View (BEV) videos where semantic actions are strictly aligned with structured language. Benchmark results evaluating ten mainstream Vision Language Models (VLMs) are provided to demonstrate the dataset's reuse value in assessing and fine-tuning the reasoning capabilities of autonomous driving models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes