Ruthvik Bokkasam

CV
h-index46
3papers
3citations
Novelty20%
AI Score38

3 Papers

CVNov 4, 2025
The Urban Vision Hackathon Dataset and Models: Towards Image Annotations and Accurate Vision Models for Indian Traffic

Akash Sharma, Chinmay Mhatre, Sankalp Gawali et al.

This report describes the UVH-26 dataset, the first public release by AIM@IISc of a large-scale dataset of annotated traffic-camera images from India. The dataset comprises 26,646 high-resolution (1080p) images sampled from 2800 Bengaluru's Safe-City CCTV cameras over a 4-week period, and subsequently annotated through a crowdsourced hackathon involving 565 college students from across India. In total, 1.8 million bounding boxes were labeled across 14 vehicle classes specific to India: Cycle, 2-Wheeler (Motorcycle), 3-Wheeler (Auto-rickshaw), LCV (Light Commercial Vehicles), Van, Tempo-traveller, Hatchback, Sedan, SUV, MUV, Mini-bus, Bus, Truck and Other. Of these, 283k-316k consensus ground truth bounding boxes and labels were derived for distinct objects in the 26k images using Majority Voting and STAPLE algorithms. Further, we train multiple contemporary detectors, including YOLO11-S/X, RT-DETR-S/X, and DAMO-YOLO-T/L using these datasets, and report accuracy based on mAP50, mAP75 and mAP50:95. Models trained on UVH-26 achieve 8.4-31.5% improvements in mAP50:95 over equivalent baseline models trained on COCO dataset, with RT-DETR-X showing the best performance at 0.67 (mAP50:95) as compared to 0.40 for COCO-trained weights for common classes (Car, Bus, and Truck). This demonstrates the benefits of domain-specific training data for Indian traffic scenarios. The release package provides the 26k images with consensus annotations based on Majority Voting (UVH-26-MV) and STAPLE (UVH-26-ST) and the 6 fine-tuned YOLO and DETR models on each of these datasets. By capturing the heterogeneity of Indian urban mobility directly from operational traffic-camera streams, UVH-26 addresses a critical gap in existing global benchmarks, and offers a foundation for advancing detection, classification, and deployment of intelligent transportation systems in emerging nations with complex traffic conditions.

56.8CVApr 27Code
BMD-45: A Large-Scale CCTV Vehicle Detection Dataset for Urban Traffic in Developing Cities

Akash Sharma, Chinmay Mhatre, Sankalp Gawali et al.

Robust vehicle detection from fixed CCTV cameras is critical for Intelligent Transportation Systems. Yet existing benchmarks predominantly feature relatively homogeneous, highly organized traffic patterns captured from ego-centric driving perspectives or controlled aerial views. This regional and sensor view bias creates a significant gap. Models trained on datasets such as UA-DETRAC and COCO struggle to generalize to the dense, heterogeneous, disorganized traffic conditions observed in rapidly developing urban centers in emerging economies. To address this limitation, we introduce BMD-45, a large-scale dataset comprising 480K bounding boxes annotated over 45K images captured from over 3.6K operational Safe City CCTV cameras. BMD-45 contains 14 fine-grained vehicle categories, including region-specific modes such as auto-rickshaws and tempo travellers, which are not present in existing benchmarks. The dataset captures real-world deployment challenges, including extreme viewpoint variation, occlusion, and vehicle density . We establish comprehensive baselines using state-of-the-art detectors and reveal a striking domain gap: models fine-tuned on UA-DETRAC achieve only 33.6% mAP@0.50:0.95, compared to 83.8% when trained in-domain on BMD-45, representing a 2.5x improvement that persists even when accounting for novel vehicle classes. This performance gap underscores the critical need for geographically diverse traffic benchmarks and establishes BMD-45 as a baseline for developing robust perception systems in underrepresented urban environments worldwide. The dataset is available at: https://huggingface.co/datasets/iisc-aim/BMD-45.

CVJun 27, 2025
Pedestrian Intention and Trajectory Prediction in Unstructured Traffic Using IDD-PeD

Ruthvik Bokkasam, Shankar Gangisetty, A. H. Abdul Hafez et al.

With the rapid advancements in autonomous driving, accurately predicting pedestrian behavior has become essential for ensuring safety in complex and unpredictable traffic conditions. The growing interest in this challenge highlights the need for comprehensive datasets that capture unstructured environments, enabling the development of more robust prediction models to enhance pedestrian safety and vehicle navigation. In this paper, we introduce an Indian driving pedestrian dataset designed to address the complexities of modeling pedestrian behavior in unstructured environments, such as illumination changes, occlusion of pedestrians, unsignalized scene types and vehicle-pedestrian interactions. The dataset provides high-level and detailed low-level comprehensive annotations focused on pedestrians requiring the ego-vehicle's attention. Evaluation of the state-of-the-art intention prediction methods on our dataset shows a significant performance drop of up to $\mathbf{15\%}$, while trajectory prediction methods underperform with an increase of up to $\mathbf{1208}$ MSE, defeating standard pedestrian datasets. Additionally, we present exhaustive quantitative and qualitative analysis of intention and trajectory baselines. We believe that our dataset will open new challenges for the pedestrian behavior research community to build robust models. Project Page: https://cvit.iiit.ac.in/research/projects/cvit-projects/iddped