Vishwajeet Pattanaik

CV
h-index46
5papers
8citations
Novelty41%
AI Score41

5 Papers

CVNov 4, 2025
The Urban Vision Hackathon Dataset and Models: Towards Image Annotations and Accurate Vision Models for Indian Traffic

Akash Sharma, Chinmay Mhatre, Sankalp Gawali et al.

This report describes the UVH-26 dataset, the first public release by AIM@IISc of a large-scale dataset of annotated traffic-camera images from India. The dataset comprises 26,646 high-resolution (1080p) images sampled from 2800 Bengaluru's Safe-City CCTV cameras over a 4-week period, and subsequently annotated through a crowdsourced hackathon involving 565 college students from across India. In total, 1.8 million bounding boxes were labeled across 14 vehicle classes specific to India: Cycle, 2-Wheeler (Motorcycle), 3-Wheeler (Auto-rickshaw), LCV (Light Commercial Vehicles), Van, Tempo-traveller, Hatchback, Sedan, SUV, MUV, Mini-bus, Bus, Truck and Other. Of these, 283k-316k consensus ground truth bounding boxes and labels were derived for distinct objects in the 26k images using Majority Voting and STAPLE algorithms. Further, we train multiple contemporary detectors, including YOLO11-S/X, RT-DETR-S/X, and DAMO-YOLO-T/L using these datasets, and report accuracy based on mAP50, mAP75 and mAP50:95. Models trained on UVH-26 achieve 8.4-31.5% improvements in mAP50:95 over equivalent baseline models trained on COCO dataset, with RT-DETR-X showing the best performance at 0.67 (mAP50:95) as compared to 0.40 for COCO-trained weights for common classes (Car, Bus, and Truck). This demonstrates the benefits of domain-specific training data for Indian traffic scenarios. The release package provides the 26k images with consensus annotations based on Majority Voting (UVH-26-MV) and STAPLE (UVH-26-ST) and the 6 fine-tuned YOLO and DETR models on each of these datasets. By capturing the heterogeneity of Indian urban mobility directly from operational traffic-camera streams, UVH-26 addresses a critical gap in existing global benchmarks, and offers a foundation for advancing detection, classification, and deployment of intelligent transportation systems in emerging nations with complex traffic conditions.

CVApr 27Code
BMD-45: A Large-Scale CCTV Vehicle Detection Dataset for Urban Traffic in Developing Cities

Akash Sharma, Chinmay Mhatre, Sankalp Gawali et al.

Robust vehicle detection from fixed CCTV cameras is critical for Intelligent Transportation Systems. Yet existing benchmarks predominantly feature relatively homogeneous, highly organized traffic patterns captured from ego-centric driving perspectives or controlled aerial views. This regional and sensor view bias creates a significant gap. Models trained on datasets such as UA-DETRAC and COCO struggle to generalize to the dense, heterogeneous, disorganized traffic conditions observed in rapidly developing urban centers in emerging economies. To address this limitation, we introduce BMD-45, a large-scale dataset comprising 480K bounding boxes annotated over 45K images captured from over 3.6K operational Safe City CCTV cameras. BMD-45 contains 14 fine-grained vehicle categories, including region-specific modes such as auto-rickshaws and tempo travellers, which are not present in existing benchmarks. The dataset captures real-world deployment challenges, including extreme viewpoint variation, occlusion, and vehicle density . We establish comprehensive baselines using state-of-the-art detectors and reveal a striking domain gap: models fine-tuned on UA-DETRAC achieve only 33.6% mAP@0.50:0.95, compared to 83.8% when trained in-domain on BMD-45, representing a 2.5x improvement that persists even when accounting for novel vehicle classes. This performance gap underscores the critical need for geographically diverse traffic benchmarks and establishes BMD-45 as a baseline for developing robust perception systems in underrepresented urban environments worldwide. The dataset is available at: https://huggingface.co/datasets/iisc-aim/BMD-45.

LGOct 24, 2023
Learning Low-Rank Latent Spaces with Simple Deterministic Autoencoder: Theoretical and Empirical Insights

Alokendu Mazumder, Tirthajit Baruah, Bhartendu Kumar et al.

The autoencoder is an unsupervised learning paradigm that aims to create a compact latent representation of data by minimizing the reconstruction loss. However, it tends to overlook the fact that most data (images) are embedded in a lower-dimensional space, which is crucial for effective data representation. To address this limitation, we propose a novel approach called Low-Rank Autoencoder (LoRAE). In LoRAE, we incorporated a low-rank regularizer to adaptively reconstruct a low-dimensional latent space while preserving the basic objective of an autoencoder. This helps embed the data in a lower-dimensional space while preserving important information. It is a simple autoencoder extension that learns low-rank latent space. Theoretically, we establish a tighter error bound for our model. Empirically, our model's superiority shines through various tasks such as image generation and downstream classification. Both theoretical and practical outcomes highlight the importance of acquiring low-dimensional embeddings.

IVSep 25, 2024
A Visual-Analytical Approach for Automatic Detection of Cyclonic Events in Satellite Observations

Akash Agrawal, Mayesh Mohapatra, Abhinav Raja et al.

Estimating the location and intensity of tropical cyclones holds crucial significance for predicting catastrophic weather events. In this study, we approach this task as a detection and regression challenge, specifically over the North Indian Ocean (NIO) region where best tracks location and wind speed information serve as the labels. The current process for cyclone detection and intensity estimation involves physics-based simulation studies which are time-consuming, only using image features will automate the process for significantly faster and more accurate predictions. While conventional methods typically necessitate substantial prior knowledge for training, we are exploring alternative approaches to enhance efficiency. This research aims to focus specifically on cyclone detection, intensity estimation and related aspects using only image input and data-driven approaches and will lead to faster inference time and automate the process as opposed to current NWP models being utilized at SAC. In context to algorithm development, a novel two stage detection and intensity estimation module is proposed. In the first level detection we try to localize the cyclone over an entire image as captured by INSAT3D over the NIO (North Indian Ocean). For the intensity estimation task, we propose a CNN-LSTM network, which works on the cyclone centered images, utilizing a ResNet-18 backbone, by which we are able to capture both temporal and spatial characteristics.

LGMay 29, 2023
DeepVAT: A Self-Supervised Technique for Cluster Assessment in Image Datasets

Alokendu Mazumder, Tirthajit Baruah, Akash Kumar Singh et al.

Estimating the number of clusters and cluster structures in unlabeled, complex, and high-dimensional datasets (like images) is challenging for traditional clustering algorithms. In recent years, a matrix reordering-based algorithm called Visual Assessment of Tendency (VAT), and its variants have attracted many researchers from various domains to estimate the number of clusters and inherent cluster structure present in the data. However, these algorithms face significant challenges when dealing with image data as they fail to effectively capture the crucial features inherent in images. To overcome these limitations, we propose a deep-learning-based framework that enables the assessment of cluster structure in complex image datasets. Our approach utilizes a self-supervised deep neural network to generate representative embeddings for the data. These embeddings are then reduced to 2-dimension using t-distributed Stochastic Neighbour Embedding (t-SNE) and inputted into VAT based algorithms to estimate the underlying cluster structure. Importantly, our framework does not rely on any prior knowledge of the number of clusters. Our proposed approach demonstrates superior performance compared to state-of-the-art VAT family algorithms and two other deep clustering algorithms on four benchmark image datasets, namely MNIST, FMNIST, CIFAR-10, and INTEL.