CVJul 19, 2025

InterAct-Video: Reasoning-Rich Video QA for Urban Traffic

Joseph Raj Vishal, Divesh Basina, Rutuja Patil, Manas Srinivas Gowda, Katha Naik, Yezhou Yang, Bharatesh Chakravarthi

arXiv:2507.14743v38.42 citationsh-index: 8Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the need for domain-specific datasets to enhance VideoQA models for traffic monitoring in intelligent transportation systems, though it is incremental as it builds on existing VideoQA methods.

The paper tackles the problem of video question answering (VideoQA) for complex real-world traffic scenes by introducing the InterAct VideoQA dataset, which includes 8 hours of footage and over 25,000 QA pairs, and shows that fine-tuning models on this dataset yields notable performance improvements.

Traffic monitoring is crucial for urban mobility, road safety, and intelligent transportation systems (ITS). Deep learning has advanced video-based traffic monitoring through video question answering (VideoQA) models, enabling structured insight extraction from traffic videos. However, existing VideoQA models struggle with the complexity of real-world traffic scenes, where multiple concurrent events unfold across spatiotemporal dimensions. To address these challenges, this paper introduces \textbf{InterAct VideoQA}, a curated dataset designed to benchmark and enhance VideoQA models for traffic monitoring tasks. The InterAct VideoQA dataset comprises 8 hours of real-world traffic footage collected from diverse intersections, segmented into 10-second video clips, with over 25,000 question-answer (QA) pairs covering spatiotemporal dynamics, vehicle interactions, incident detection, and other critical traffic attributes. State-of-the-art VideoQA models are evaluated on InterAct VideoQA, exposing challenges in reasoning over fine-grained spatiotemporal dependencies within complex traffic scenarios. Additionally, fine-tuning these models on InterAct VideoQA yields notable performance improvements, demonstrating the necessity of domain-specific datasets for VideoQA. InterAct VideoQA is publicly available as a benchmark dataset to facilitate future research in real-world deployable VideoQA models for intelligent transportation systems. GitHub Repo: https://github.com/joe-rabbit/InterAct_VideoQA

View on arXiv PDF Code

Similar