Sanjay Kumar

CV
h-index16
13papers
350citations
Novelty31%
AI Score49

13 Papers

IVAug 11, 2023
Classification of White Blood Cells Using Machine and Deep Learning Models: A Systematic Review

Rabia Asghar, Sanjay Kumar, Paul Hynds et al.

Machine learning (ML) and deep learning (DL) models have been employed to significantly improve analyses of medical imagery, with these approaches used to enhance the accuracy of prediction and classification. Model predictions and classifications assist diagnoses of various cancers and tumors. This review presents an in-depth analysis of modern techniques applied within the domain of medical image analysis for white blood cell classification. The methodologies that use blood smear images, magnetic resonance imaging (MRI), X-rays, and similar medical imaging domains are identified and discussed, with a detailed analysis of ML/DL techniques applied to the classification of white blood cells (WBCs) representing the primary focus of the review. The data utilized in this research has been extracted from a collection of 136 primary papers that were published between the years 2006 and 2023. The most widely used techniques and best-performing white blood cell classification methods are identified. While the use of ML and DL for white blood cell classification has concurrently increased and improved in recent year, significant challenges remain - 1) Availability of appropriate datasets remain the primary challenge, and may be resolved using data augmentation techniques. 2) Medical training of researchers is recommended to improve current understanding of white blood cell structure and subsequent selection of appropriate classification models. 3) Advanced DL networks including Generative Adversarial Networks, R-CNN, Fast R-CNN, and faster R-CNN will likely be increasingly employed to supplement or replace current techniques.

CVNov 6, 2025
Evaluating the Impact of Weather-Induced Sensor Occlusion on BEVFusion for 3D Object Detection

Sanjay Kumar, Tim Brophy, Eoin Martino Grua et al.

Accurate 3D object detection is essential for automated vehicles to navigate safely in complex real-world environments. Bird's Eye View (BEV) representations, which project multi-sensor data into a top-down spatial format, have emerged as a powerful approach for robust perception. Although BEV-based fusion architectures have demonstrated strong performance through multimodal integration, the effects of sensor occlusions, caused by environmental conditions such as fog, haze, or physical obstructions, on 3D detection accuracy remain underexplored. In this work, we investigate the impact of occlusions on both camera and Light Detection and Ranging (LiDAR) outputs using the BEVFusion architecture, evaluated on the nuScenes dataset. Detection performance is measured using mean Average Precision (mAP) and the nuScenes Detection Score (NDS). Our results show that moderate camera occlusions lead to a 41.3% drop in mAP (from 35.6% to 20.9%) when detection is based only on the camera. On the other hand, LiDAR sharply drops in performance only under heavy occlusion, with mAP falling by 47.3% (from 64.7% to 34.1%), with a severe impact on long-range detection. In fused settings, the effect depends on which sensor is occluded: occluding the camera leads to a minor 4.1% drop (from 68.5% to 65.7%), while occluding LiDAR results in a larger 26.8% drop (to 50.1%), revealing the model's stronger reliance on LiDAR for the task of 3D object detection. Our results highlight the need for future research into occlusion-aware evaluation methods and improved sensor fusion techniques that can maintain detection accuracy in the presence of partial sensor failure or degradation due to adverse environmental conditions.

IVAug 11, 2023
Classification of All Blood Cell Images using ML and DL Models

Rabia Asghar, Sanjay Kumar, Paul Hynds et al.

Human blood primarily comprises plasma, red blood cells, white blood cells, and platelets. It plays a vital role in transporting nutrients to different organs, where it stores essential health-related data about the human body. Blood cells are utilized to defend the body against diverse infections, including fungi, viruses, and bacteria. Hence, blood analysis can help physicians assess an individual's physiological condition. Blood cells have been sub-classified into eight groups: Neutrophils, eosinophils, basophils, lymphocytes, monocytes, immature granulocytes (promyelocytes, myelocytes, and metamyelocytes), erythroblasts, and platelets or thrombocytes on the basis of their nucleus, shape, and cytoplasm. Traditionally, pathologists and hematologists in laboratories have examined these blood cells using a microscope before manually classifying them. The manual approach is slower and more prone to human error. Therefore, it is essential to automate this process. In our paper, transfer learning with CNN pre-trained models. VGG16, VGG19, ResNet-50, ResNet-101, ResNet-152, InceptionV3, MobileNetV2, and DenseNet-20 applied to the PBC dataset's normal DIB. The overall accuracy achieved with these models lies between 91.375 and 94.72%. Hence, inspired by these pre-trained architectures, a model has been proposed to automatically classify the ten types of blood cells with increased accuracy. A novel CNN-based framework has been presented to improve accuracy. The proposed CNN model has been tested on the PBC dataset normal DIB. The outcomes of the experiments demonstrate that our CNN-based framework designed for blood cell classification attains an accuracy of 99.91% on the PBC dataset. Our proposed convolutional neural network model performs competitively when compared to earlier results reported in the literature.

LGMay 17, 2024
Generative Artificial Intelligence: A Systematic Review and Applications

Sandeep Singh Sengar, Affan Bin Hasan, Sanjay Kumar et al.

In recent years, the study of artificial intelligence (AI) has undergone a paradigm shift. This has been propelled by the groundbreaking capabilities of generative models both in supervised and unsupervised learning scenarios. Generative AI has shown state-of-the-art performance in solving perplexing real-world conundrums in fields such as image translation, medical diagnostics, textual imagery fusion, natural language processing, and beyond. This paper documents the systematic review and analysis of recent advancements and techniques in Generative AI with a detailed discussion of their applications including application-specific models. Indeed, the major impact that generative AI has made to date, has been in language generation with the development of large language models, in the field of image translation and several other interdisciplinary applications of generative AI. Moreover, the primary contribution of this paper lies in its coherent synthesis of the latest advancements in these areas, seamlessly weaving together contemporary breakthroughs in the field. Particularly, how it shares an exploration of the future trajectory for generative AI. In conclusion, the paper ends with a discussion of Responsible AI principles, and the necessary ethical considerations for the sustainability and growth of these generative models.

CLMar 14
FLUX: Data Worth Training On

Gowtham, Sai Rupesh, Sanjay Kumar et al.

Modern large language model training is no longer limited by data availability, but by the inability of existing preprocessing pipelines to simultaneously achieve massive scale and high data quality. Current approaches are forced to sacrifice one for the other: either aggressively filtering to improve quality at the cost of severe token loss, or retaining large volumes of data while introducing substantial noise. In this work, we introduce FLUX, a preprocessing pipeline specifically designed to break this long-standing trade-off by maximizing token retention while enforcing rigorous quality control. Models trained on FLUX-curated data consistently outperform prior methods. A 3B-parameter model trained on 60B tokens with FLUX achieves 32.14% MMLU accuracy, surpassing the previous state-of-the-art pipeline DCLM (31.98%) and significantly outperforming FineWeb (29.88%). FLUX achieves the same aggregate score as a model trained on DCLM data using only 39B tokens, resulting in a 34.4% reduction in training compute. At the data level, FLUX extracts 50B usable tokens from a single dump (CC-MAIN-2025-51), compared to 40B from DCLM (+25% retention). FLUX-Base yields 192B tokens, exceeding FineWeb's 170B while still maintaining superior quality. Overall, FLUX establishes a new state of the art in web-scale data preprocessing by demonstrating that high retention, strong quality control, and computational efficiency can be achieved simultaneously, redefining the limits of scalable dataset construction for modern language models.

CVMay 7
Smart Railway Obstruction Detection System using IoT and Computer Vision

Pravin Kumar, Mritunjay Shall Peelam, Ramakant Kumar et al.

Railway track intrusions pose a critical safety challenge for Indian Railways, encompassing wildlife incursions and deliberate malicious obstructions. The December 2025 collision in Assam, in which seven elephants were killed by the Rajdhani Express, underscores the urgency of effective real-time detection. Existing solutions such as the optical fiber-based Gajraj system suffer from prohibitive costs (\$1000/km) and high false alarm rates, limiting deployment to only 20 of India's 101 elephant corridors. This paper proposes NETRA, a cost-effective, internet-independent intrusion detection system deployed on Raspberry Pi Zero W and Raspberry Pi 4 edge platforms. NETRA employs probabilistic sensor fusion integrating a PIR motion sensor and an HC-SR04 ultrasonic distance sensor with a tunable threshold (tau_c = 0.65), enabling event-driven camera activation that reduces unnecessary visual processing by 52%. Upon confirmed intrusion, edge-AI classification using MobileNet-SSD (Pi Zero) or YOLOv5 ONNX (Pi 4) identifies threats including humans, large animals, and track obstructions. Confirmed threats are transmitted via LoRa (868 MHz) to alert the locomotive driver within 2.4 seconds end-to-end. Experimental evaluation across 113 motion events demonstrated 95% detection accuracy with zero false alarms through probabilistic fusion, compared to 85% for binary methods. Raspberry Pi 4 with YOLOv5 achieved 83.5% elephant F1-score, a 5.6x improvement over Pi Zero's heuristic approach (14.8%). LoRa communication achieved 100% packet delivery across 1-2 km in field trials. NETRA reduces deployment cost by 75% (\$247/km vs \$1000/km for Gajraj) while providing unified detection of both wildlife and obstruction threats.

CVJan 10, 2025
Minimizing Occlusion Effect on Multi-View Camera Perception in BEV with Multi-Sensor Fusion

Sanjay Kumar, Hiep Truong, Sushil Sharma et al.

Autonomous driving technology is rapidly evolving, offering the potential for safer and more efficient transportation. However, the performance of these systems can be significantly compromised by the occlusion on sensors due to environmental factors like dirt, dust, rain, and fog. These occlusions severely affect vision-based tasks such as object detection, vehicle segmentation, and lane recognition. In this paper, we investigate the impact of various kinds of occlusions on camera sensor by projecting their effects from multi-view camera images of the nuScenes dataset into the Bird's-Eye View (BEV) domain. This approach allows us to analyze how occlusions spatially distribute and influence vehicle segmentation accuracy within the BEV domain. Despite significant advances in sensor technology and multi-sensor fusion, a gap remains in the existing literature regarding the specific effects of camera occlusions on BEV-based perception systems. To address this gap, we use a multi-sensor fusion technique that integrates LiDAR and radar sensor data to mitigate the performance degradation caused by occluded cameras. Our findings demonstrate that this approach significantly enhances the accuracy and robustness of vehicle segmentation tasks, leading to more reliable autonomous driving systems.

CLNov 22, 2025
Blu-WERP (Web Extraction and Refinement Pipeline): A Scalable Pipeline for Preprocessing Large Language Model Datasets

Gowtham, Sai Rupesh, Sanjay Kumar et al.

High-quality training data is fundamental to large language model (LLM) performance, yet existing preprocessing pipelines often struggle to effectively remove noise and unstructured content from web-scale corpora. This paper presents Blu-WERP, a novel data preprocessing pipeline designed to optimize the quality of Common Crawl WARC files for LLM training. We demonstrate that Blu-WERP significantly outperforms established baselines including DCLM across multiple model scales and evaluation benchmarks. Our pipeline processes CC WARC dumps, implementing advanced filtering and quality assessment mechanisms. We conducted comprehensive evaluations using models with 150M, 400M, 530M, 750M, and 1B parameters, testing against nine standard benchmarks categorized as World Knowledge & Reasoning, Language Understanding, and Commonsense Reasoning. Results show Blu-WERP consistently achieved superior performance across all model scales. At the 1B parameter scale, Relatively Blu-WERP demonstrates a 4.0% and 9.5% aggregate improvement over DCLM and Fineweb respectively, while achieving quality-per-token efficiency gain. Categorical analysis reveals 2.4% improvement in World Knowledge & Reasoning, 6.2% improvement in Language Understanding, and 4.2% improvement in Commonsense Reasoning. These results establish Blu-WERP as a state-of-the-art preprocessing pipeline that substantially improves LLM training data quality and downstream model performance with reduced computational cost. Our findings contribute to the growing body of research on data-centric AI, demonstrating that preprocessing pipeline design significantly impacts LLM capabilities. The Blu-WERP pipeline represents a practical advancement in data quality optimization, offering researchers and practitioners an effective solution for improving LLM training efficiency and model performance.

CVOct 21, 2025
Occluded nuScenes: A Multi-Sensor Dataset for Evaluating Perception Robustness in Automated Driving

Sanjay Kumar, Tim Brophy, Reenu Mohandas et al.

Robust perception in automated driving requires reliable performance under adverse conditions, where sensors may be affected by partial failures or environmental occlusions. Although existing autonomous driving datasets inherently contain sensor noise and environmental variability, very few enable controlled, parameterised, and reproducible degradations across multiple sensing modalities. This gap limits the ability to systematically evaluate how perception and fusion architectures perform under well-defined adverse conditions. To address this limitation, we introduce the Occluded nuScenes Dataset, a novel extension of the widely used nuScenes benchmark. For the camera modality, we release both the full and mini versions with four types of occlusions, two adapted from public implementations and two newly designed. For radar and LiDAR, we provide parameterised occlusion scripts that implement three types of degradations each, enabling flexible and repeatable generation of occluded data. This resource supports consistent, reproducible evaluation of perception models under partial sensor failures and environmental interference. By releasing the first multi-sensor occlusion dataset with controlled and reproducible degradations, we aim to advance research on robust sensor fusion, resilience analysis, and safety-critical perception in automated driving.

LGMar 13, 2025
GBSVR: Granular Ball Support Vector Regression

Reshma Rastogi, Ankush Bisht, Sanjay Kumar et al.

Support Vector Regression (SVR) and its variants are widely used to handle regression tasks, however, since their solution involves solving an expensive quadratic programming problem, it limits its application, especially when dealing with large datasets. Additionally, SVR uses an epsilon-insensitive loss function which is sensitive to outliers and therefore can adversely affect its performance. We propose Granular Ball Support Vector Regression (GBSVR) to tackle problem of regression by using granular ball concept. These balls are useful in simplifying complex data spaces for machine learning tasks, however, to the best of our knowledge, they have not been sufficiently explored for regression problems. Granular balls group the data points into balls based on their proximity and reduce the computational cost in SVR by replacing the large number of data points with far fewer granular balls. This work also suggests a discretization method for continuous-valued attributes to facilitate the construction of granular balls. The effectiveness of the proposed approach is evaluated on several benchmark datasets and it outperforms existing state-of-the-art approaches

LGJul 17, 2021
Self Training with Ensemble of Teacher Models

Soumyadeep Ghosh, Sanjay Kumar, Janu Verma et al.

In order to train robust deep learning models, large amounts of labelled data is required. However, in the absence of such large repositories of labelled data, unlabeled data can be exploited for the same. Semi-Supervised learning aims to utilize such unlabeled data for training classification models. Recent progress of self-training based approaches have shown promise in this area, which leads to this study where we utilize an ensemble approach for the same. A by-product of any semi-supervised approach may be loss of calibration of the trained model especially in scenarios where unlabeled data may contain out-of-distribution samples, which leads to this investigation on how to adapt to such effects. Our proposed algorithm carefully avoids common pitfalls in utilizing unlabeled data and leads to a more accurate and calibrated supervised model compared to vanilla self-training based student-teacher algorithms. We perform several experiments on the popular STL-10 database followed by an extensive analysis of our approach and study its effects on model accuracy and calibration.

ASAug 15, 2020
Experimental investigations of psychoacoustic characteristics of household vacuum cleaners

Sanjay Kumar, Wong Sze Wing, Teng Mingbang et al.

Vacuum cleaners are one of the most widely used household appliances associated with unpleasant noises. Previous studies have indicated the severity of vacuum cleaner noise and its impact on the users nearby. The quantified measurements of the generated noise standalone are not sufficient for the selection or designing of vacuum cleaners. The human perception must also be included for a better assessment of the quality of sound. A hybrid approach known as psychoacoustics, which comprises subjective and objective evaluations of sounds, is widely used in recent times. This paper focuses on the experimental assessment of psychoacoustical matrices for household vacuum cleaners. Three vacuum cleaners with different specifications have been selected as test candidates, and their sound qualities have been analyzed. Besides, the annoyance index has been assessed for these vacuum cleaners.

NCMar 9, 2018
Automated Classification of Hand-grip action on Objects using Machine Learning

Anju Mishra, Shanu Sharma, Sanjay Kumar et al.

Brain computer interface is the current area of research to provide assistance to disabled persons. To cope up with the growing needs of BCI applications, this paper presents an automated classification scheme for handgrip actions on objects by using Electroencephalography (EEG) data. The presented approach focuses on investigation of classifying correct and incorrect handgrip responses for objects by using EEG recorded patterns. The method starts with preprocessing of data, followed by extraction of relevant features from the epoch data in the form of discrete wavelet transform (DWT), and entropy measures. After computing feature vectors, artificial neural network classifiers used to classify the patterns into correct and incorrect handgrips on different objects. The proposed method was tested on real dataset, which contains EEG recordings from 14 persons. The results showed that the proposed approach is effective and may be useful to develop a variety of BCI based devices to control hand movements.