CVNov 28, 2025Code
Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text UnderstandingAnik De, Abhirama Subramanyam Penamakuri, Rajeev Yadav et al.
Reading scene text, that is, text appearing in images, has numerous application areas, including assistive technology, search, and e-commerce. Although scene text recognition in English has advanced significantly and is often considered nearly a solved problem, Indian language scene text recognition remains an open challenge. This is due to script diversity, non-standard fonts, and varying writing styles, and, more importantly, the lack of high-quality datasets and open-source models. To address these gaps, we introduce the Bharat Scene Text Dataset (BSTD) - a large-scale and comprehensive benchmark for studying Indian Language Scene Text Recognition. It comprises more than 100K words that span 11 Indian languages and English, sourced from over 6,500 scene images captured across various linguistic regions of India. The dataset is meticulously annotated and supports multiple scene text tasks, including: (i) Scene Text Detection, (ii) Script Identification, (iii) Cropped Word Recognition, and (iv) End-to-End Scene Text Recognition. We evaluated state-of-the-art models originally developed for English by adapting (fine-tuning) them for Indian languages. Our results highlight the challenges and opportunities in Indian language scene text recognition. We believe that this dataset represents a significant step toward advancing research in this domain. All our models and data are open source.
CVMay 7
Smart Railway Obstruction Detection System using IoT and Computer VisionPravin Kumar, Mritunjay Shall Peelam, Ramakant Kumar et al.
Railway track intrusions pose a critical safety challenge for Indian Railways, encompassing wildlife incursions and deliberate malicious obstructions. The December 2025 collision in Assam, in which seven elephants were killed by the Rajdhani Express, underscores the urgency of effective real-time detection. Existing solutions such as the optical fiber-based Gajraj system suffer from prohibitive costs (\$1000/km) and high false alarm rates, limiting deployment to only 20 of India's 101 elephant corridors. This paper proposes NETRA, a cost-effective, internet-independent intrusion detection system deployed on Raspberry Pi Zero W and Raspberry Pi 4 edge platforms. NETRA employs probabilistic sensor fusion integrating a PIR motion sensor and an HC-SR04 ultrasonic distance sensor with a tunable threshold (tau_c = 0.65), enabling event-driven camera activation that reduces unnecessary visual processing by 52%. Upon confirmed intrusion, edge-AI classification using MobileNet-SSD (Pi Zero) or YOLOv5 ONNX (Pi 4) identifies threats including humans, large animals, and track obstructions. Confirmed threats are transmitted via LoRa (868 MHz) to alert the locomotive driver within 2.4 seconds end-to-end. Experimental evaluation across 113 motion events demonstrated 95% detection accuracy with zero false alarms through probabilistic fusion, compared to 85% for binary methods. Raspberry Pi 4 with YOLOv5 achieved 83.5% elephant F1-score, a 5.6x improvement over Pi Zero's heuristic approach (14.8%). LoRa communication achieved 100% packet delivery across 1-2 km in field trials. NETRA reduces deployment cost by 75% (\$247/km vs \$1000/km for Gajraj) while providing unified detection of both wildlife and obstruction threats.
LGJan 29
Privacy-Preserving Sensor-Based Human Activity Recognition for Low-Resource Healthcare Using Classical Machine LearningRamakant Kumar, Pravin Kumar
Limited access to medical infrastructure forces elderly and vulnerable patients to rely on home-based care, often leading to neglect and poor adherence to therapeutic exercises such as yoga or physiotherapy. To address this gap, we propose a low-cost and automated human activity recognition (HAR) framework based on wearable inertial sensors and machine learning. Activity data, including walking, walking upstairs, walking downstairs, sitting, standing, and lying, were collected using accelerometer and gyroscope measurements. Four classical classifiers, Logistic Regression, Random Forest, Support Vector Machine (SVM), and k-Nearest Neighbors (k-NN), were evaluated and compared with the proposed Support Tensor Machine (STM). Experimental results show that SVM achieved an accuracy of 93.33 percent, while Logistic Regression, Random Forest, and k-NN achieved 91.11 percent. In contrast, STM significantly outperformed these models, achieving a test accuracy of 96.67 percent and the highest cross-validation accuracy of 98.50 percent. Unlike conventional methods, STM leverages tensor representations to preserve spatio-temporal motion dynamics, resulting in robust classification across diverse activities. The proposed framework demonstrates strong potential for remote healthcare, elderly assistance, child activity monitoring, yoga feedback, and smart home wellness, offering a scalable solution for low-resource and rural healthcare settings.