CVMar 27, 2025

RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives

arXiv:2503.21459v117 citationsh-index: 6Has CodeCVPR
Originality Synthesis-oriented
AI Analysis

This provides a diverse benchmark for road event understanding from social media, addressing biases in existing datasets, but it is incremental as it focuses on dataset creation and benchmarking.

The authors tackled the problem of road event understanding by introducing RoadSocial, a large-scale VideoQA dataset derived from social media videos with 13.2K videos and 260K QA pairs, and benchmarked 18 Video LLMs, showing its utility in improving general-purpose models.

We introduce RoadSocial, a large-scale, diverse VideoQA dataset tailored for generic road event understanding from social media narratives. Unlike existing datasets limited by regional bias, viewpoint bias and expert-driven annotations, RoadSocial captures the global complexity of road events with varied geographies, camera viewpoints (CCTV, handheld, drones) and rich social discourse. Our scalable semi-automatic annotation framework leverages Text LLMs and Video LLMs to generate comprehensive question-answer pairs across 12 challenging QA tasks, pushing the boundaries of road event understanding. RoadSocial is derived from social media videos spanning 14M frames and 414K social comments, resulting in a dataset with 13.2K videos, 674 tags and 260K high-quality QA pairs. We evaluate 18 Video LLMs (open-source and proprietary, driving-specific and general-purpose) on our road event understanding benchmark. We also demonstrate RoadSocial's utility in improving road event understanding capabilities of general-purpose Video LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes