CVMar 27, 2025

RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives

Chirag Parikh, Deepti Rawat, Rakshitha R. T., Tathagata Ghosh, Ravi Kiran Sarvadevabhatla

arXiv:2503.21459v121.117 citationsh-index: 6Has CodeCVPR

Originality Synthesis-oriented

AI Analysis

This provides a diverse benchmark for road event understanding from social media, addressing biases in existing datasets, but it is incremental as it focuses on dataset creation and benchmarking.

The authors tackled the problem of road event understanding by introducing RoadSocial, a large-scale VideoQA dataset derived from social media videos with 13.2K videos and 260K QA pairs, and benchmarked 18 Video LLMs, showing its utility in improving general-purpose models.

We introduce RoadSocial, a large-scale, diverse VideoQA dataset tailored for generic road event understanding from social media narratives. Unlike existing datasets limited by regional bias, viewpoint bias and expert-driven annotations, RoadSocial captures the global complexity of road events with varied geographies, camera viewpoints (CCTV, handheld, drones) and rich social discourse. Our scalable semi-automatic annotation framework leverages Text LLMs and Video LLMs to generate comprehensive question-answer pairs across 12 challenging QA tasks, pushing the boundaries of road event understanding. RoadSocial is derived from social media videos spanning 14M frames and 414K social comments, resulting in a dataset with 13.2K videos, 674 tags and 260K high-quality QA pairs. We evaluate 18 Video LLMs (open-source and proprietary, driving-specific and general-purpose) on our road event understanding benchmark. We also demonstrate RoadSocial's utility in improving road event understanding capabilities of general-purpose Video LLMs.

View on arXiv PDF

Similar