Muhammad Hussain

CV
h-index7
18papers
4,180citations
Novelty29%
AI Score30

18 Papers

CVJul 30, 2024
What is YOLOv5: A deep look into the internal features of the popular object detector

Rahima Khanam, Muhammad Hussain

This study presents a comprehensive analysis of the YOLOv5 object detection model, examining its architecture, training methodologies, and performance. Key components, including the Cross Stage Partial backbone and Path Aggregation-Network, are explored in detail. The paper reviews the model's performance across various metrics and hardware platforms. Additionally, the study discusses the transition from Darknet to PyTorch and its impact on model development. Overall, this research provides insights into YOLOv5's capabilities and its position within the broader landscape of object detection and why it is a popular choice for constrained edge deployment scenarios.

CVJul 3, 2024
YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-time Vision

Muhammad Hussain

This paper presents a comprehensive review of the evolution of the YOLO (You Only Look Once) object detection algorithm, focusing on YOLOv5, YOLOv8, and YOLOv10. We analyze the architectural advancements, performance improvements, and suitability for edge deployment across these versions. YOLOv5 introduced significant innovations such as the CSPDarknet backbone and Mosaic Augmentation, balancing speed and accuracy. YOLOv8 built upon this foundation with enhanced feature extraction and anchor-free detection, improving versatility and performance. YOLOv10 represents a leap forward with NMS-free training, spatial-channel decoupled downsampling, and large-kernel convolutions, achieving state-of-the-art performance with reduced computational overhead. Our findings highlight the progressive enhancements in accuracy, efficiency, and real-time performance, particularly emphasizing their applicability in resource-constrained environments. This review provides insights into the trade-offs between model complexity and detection accuracy, offering guidance for selecting the most appropriate YOLO version for specific edge computing applications.

CVAug 27, 2024
A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships

Gracile Astlin Pereira, Muhammad Hussain

Transformer-based models have transformed the landscape of natural language processing (NLP) and are increasingly applied to computer vision tasks with remarkable success. These models, renowned for their ability to capture long-range dependencies and contextual information, offer a promising alternative to traditional convolutional neural networks (CNNs) in computer vision. In this review paper, we provide an extensive overview of various transformer architectures adapted for computer vision tasks. We delve into how these models capture global context and spatial relationships in images, empowering them to excel in tasks such as image classification, object detection, and segmentation. Analyzing the key components, training methodologies, and performance metrics of transformer-based models, we highlight their strengths, limitations, and recent advancements. Additionally, we discuss potential research directions and applications of transformer-based models in computer vision, offering insights into their implications for future advancements in the field.

CVJul 30, 2024
A Comparative Analysis of YOLOv5, YOLOv8, and YOLOv10 in Kitchen Safety

Athulya Sundaresan Geetha, Muhammad Hussain

Knife safety in the kitchen is essential for preventing accidents or injuries with an emphasis on proper handling, maintenance, and storage methods. This research presents a comparative analysis of three YOLO models, YOLOv5, YOLOv8, and YOLOv10, to detect the hazards involved in handling knife, concentrating mainly on ensuring fingers are curled while holding items to be cut and that hands should only be in contact with knife handle avoiding the blade. Precision, recall, F-score, and normalized confusion matrix are used to evaluate the performance of the models. The results indicate that YOLOv5 performed better than the other two models in identifying the hazard of ensuring hands only touch the blade, while YOLOv8 excelled in detecting the hazard of curled fingers while holding items. YOLOv5 and YOLOv8 performed almost identically in recognizing classes such as hand, knife, and vegetable, whereas YOLOv5, YOLOv8, and YOLOv10 accurately identified the cutting board. This paper provides insights into the advantages and shortcomings of these models in real-world settings. Moreover, by detailing the optimization of YOLO architectures for safe knife handling, this study promotes the development of increased accuracy and efficiency in safety surveillance systems.

CVOct 23, 2024
YOLOv11: An Overview of the Key Architectural Enhancements

Rahima Khanam, Muhammad Hussain

This study presents an architectural analysis of YOLOv11, the latest iteration in the YOLO (You Only Look Once) series of object detection models. We examine the models architectural innovations, including the introduction of the C3k2 (Cross Stage Partial with kernel size 2) block, SPPF (Spatial Pyramid Pooling - Fast), and C2PSA (Convolutional block with Parallel Spatial Attention) components, which contribute in improving the models performance in several ways such as enhanced feature extraction. The paper explores YOLOv11's expanded capabilities across various computer vision tasks, including object detection, instance segmentation, pose estimation, and oriented object detection (OBB). We review the model's performance improvements in terms of mean Average Precision (mAP) and computational efficiency compared to its predecessors, with a focus on the trade-off between parameter count and accuracy. Additionally, the study discusses YOLOv11's versatility across different model sizes, from nano to extra-large, catering to diverse application needs from edge devices to high-performance computing environments. Our research provides insights into YOLOv11's position within the broader landscape of object detection and its potential impact on real-time computer vision applications.

CVAug 12, 2024
From SAM to SAM 2: Exploring Improvements in Meta's Segment Anything Model

Athulya Sundaresan Geetha, Muhammad Hussain

The Segment Anything Model (SAM), introduced to the computer vision community by Meta in April 2023, is a groundbreaking tool that allows automated segmentation of objects in images based on prompts such as text, clicks, or bounding boxes. SAM excels in zero-shot performance, segmenting unseen objects without additional training, stimulated by a large dataset of over one billion image masks. SAM 2 expands this functionality to video, leveraging memory from preceding and subsequent frames to generate accurate segmentation across entire videos, enabling near real-time performance. This comparison shows how SAM has evolved to meet the growing need for precise and efficient segmentation in various applications. The study suggests that future advancements in models like SAM will be crucial for improving computer vision technology.

CVFeb 20, 2025
YOLOv12: A Breakdown of the Key Architectural Features

Mujadded Al Rabbani Alif, Muhammad Hussain

This paper presents an architectural analysis of YOLOv12, a significant advancement in single-stage, real-time object detection building upon the strengths of its predecessors while introducing key improvements. The model incorporates an optimised backbone (R-ELAN), 7x7 separable convolutions, and FlashAttention-driven area-based attention, improving feature extraction, enhanced efficiency, and robust detections. With multiple model variants, similar to its predecessors, YOLOv12 offers scalable solutions for both latency-sensitive and high-accuracy applications. Experimental results manifest consistent gains in mean average precision (mAP) and inference speed, making YOLOv12 a compelling choice for applications in autonomous systems, security, and real-time analytics. By achieving an optimal balance between computational efficiency and performance, YOLOv12 sets a new benchmark for real-time computer vision, facilitating deployment across diverse hardware platforms, from edge devices to high-performance clusters.

CVApr 16, 2025
A Review of YOLOv12: Attention-Based Enhancements vs. Previous Versions

Rahima Khanam, Muhammad Hussain

The YOLO (You Only Look Once) series has been a leading framework in real-time object detection, consistently improving the balance between speed and accuracy. However, integrating attention mechanisms into YOLO has been challenging due to their high computational overhead. YOLOv12 introduces a novel approach that successfully incorporates attention-based enhancements while preserving real-time performance. This paper provides a comprehensive review of YOLOv12's architectural innovations, including Area Attention for computationally efficient self-attention, Residual Efficient Layer Aggregation Networks for improved feature aggregation, and FlashAttention for optimized memory access. Additionally, we benchmark YOLOv12 against prior YOLO versions and competing object detectors, analyzing its improvements in accuracy, inference speed, and computational efficiency. Through this analysis, we demonstrate how YOLOv12 advances real-time object detection by refining the latency-accuracy trade-off and optimizing computational resources.

CVJun 14, 2024
YOLOv1 to YOLOv10: A comprehensive review of YOLO variants and their application in the agricultural domain

Mujadded Al Rabbani Alif, Muhammad Hussain

This survey investigates the transformative potential of various YOLO variants, from YOLOv1 to the state-of-the-art YOLOv10, in the context of agricultural advancements. The primary objective is to elucidate how these cutting-edge object detection models can re-energise and optimize diverse aspects of agriculture, ranging from crop monitoring to livestock management. It aims to achieve key objectives, including the identification of contemporary challenges in agriculture, a detailed assessment of YOLO's incremental advancements, and an exploration of its specific applications in agriculture. This is one of the first surveys to include the latest YOLOv10, offering a fresh perspective on its implications for precision farming and sustainable agricultural practices in the era of Artificial Intelligence and automation. Further, the survey undertakes a critical analysis of YOLO's performance, synthesizes existing research, and projects future trends. By scrutinizing the unique capabilities packed in YOLO variants and their real-world applications, this survey provides valuable insights into the evolving relationship between YOLO variants and agriculture. The findings contribute towards a nuanced understanding of the potential for precision farming and sustainable agricultural practices, marking a significant step forward in the integration of advanced object detection technologies within the agricultural sector.

IVOct 8, 2021
Designing the Architecture of a Convolutional Neural Network Automatically for Diabetic Retinopathy Diagnosis

Fahman Saeed, Muhammad Hussain, Hatim A Aboalsamh et al.

The prevalence of diabetic retinopathy (DR) has reached 34.6% worldwide and is a major cause of blindness among middle-aged diabetic patients. Regular DR screening using fundus photography helps detect its complications and prevent its progression to advanced levels. As manual screening is time-consuming and subjective, machine learning (ML) and deep learning (DL) have been employed to aid graders. However, the existing CNN-based methods use either pre-trained CNN models or a brute force approach to design new CNN models, which are not customized to the complexity of fundus images. To overcome this issue, we introduce an approach for custom-design of CNN models, whose architectures are adapted to the structural patterns of fundus images and better represent the DR-relevant features. It takes the leverage of k-medoid clustering, principal component analysis (PCA), and inter-class and intra-class variations to automatically determine the depth and width of a CNN model. The designed models are lightweight, adapted to the internal structures of fundus images, and encode the discriminative patterns of DR lesions. The technique is validated on a local dataset from King Saud University Medical City, Saudi Arabia, and two challenging benchmark datasets from Kaggle: EyePACS and APTOS2019. The custom-designed models outperform the famous pre-trained CNN models like ResNet152, Densnet121, and ResNeSt50 with a significant decrease in the number of parameters and compete well with the state-of-the-art CNN-based DR screening methods. The proposed approach is helpful for DR screening under diverse clinical settings and referring the patients who may need further assessment and treatment to expert ophthalmologists.

IVSep 10, 2021
A Deep Learning-Based Unified Framework for Red Lesions Detection on Retinal Fundus Images

Norah Asiri, Muhammad Hussain, Fadwa Al Adel

Red-lesions, microaneurysms (MAs) and hemorrhages (HMs), are the early signs of diabetic retinopathy (DR). The automatic detection of MAs and HMs on retinal fundus images is a challenging task. Most of the existing methods detect either only MAs or only HMs because of the difference in their texture, sizes, and morphology. Though some methods detect both MAs and HMs, they suffer from the curse of dimensionality of shape and colors features and fail to detect all shape variations of HMs such as flame-shaped. Leveraging the progress in deep learning, we proposed a two-stream red lesions detection system dealing simultaneously with small and large red lesions. For this system, we introduced a new ROIs candidates generation method for large red lesions on fundus images; it is based on blood vessel segmentation and morphological operations, and reduces the computational complexity, and enhances the detection accuracy by generating a small number of potential candidates. For detection, we proposed a framework with two streams. We used pretrained VGGNet as a backbone model and carried out several extensive experiments to tune it for vessels segmentation and candidates generation, and finally learning the appropriate mapping, which yields better detection of the red lesions comparing with the state-of-the-art methods. The experimental results validated the effectiveness of the system in the detection of both MAs and HMs; it yields higher performance for per lesion detection; its sensitivity equals 0.8589 and good FROC score under 8 FPIs on DiaretDB1-MA reports FROC=0.7518, and with SN=0.7552 and good FROC score under 2,4and 8 FPIs on DiaretDB1-HM, and SN=0.8157 on e-ophtha with overall FROC=0.4537 and on ROCh dataset with FROC=0.3461 which is higher than the state-of-the art methods. For DR screening, the system performs well with good AUC on DiaretDB1-MA, DiaretDB1-HM, and e-ophtha datasets.

LGApr 30, 2019
Automatic Emotion Recognition (AER) System based on Two-Level Ensemble of Lightweight Deep CNN Models

Emad-ul-Haq Qazi, Muhammad Hussain, Hatim AboAlsamh et al.

Emotions play a crucial role in human interaction, health care and security investigations and monitoring. Automatic emotion recognition (AER) using electroencephalogram (EEG) signals is an effective method for decoding the real emotions, which are independent of body gestures, but it is a challenging problem. Several automatic emotion recognition systems have been proposed, which are based on traditional hand-engineered approaches and their performances are very poor. Motivated by the outstanding performance of deep learning (DL) in many recognition tasks, we introduce an AER system (Deep-AER) based on EEG brain signals using DL. A DL model involves a large number of learnable parameters, and its training needs a large dataset of EEG signals, which is difficult to acquire for AER problem. To overcome this problem, we proposed a lightweight pyramidal one-dimensional convolutional neural network (LP-1D-CNN) model, which involves a small number of learnable parameters. Using LP-1D-CNN, we build a two level ensemble model. In the first level of the ensemble, each channel is scanned incrementally by LP-1D-CNN to generate predictions, which are fused using majority vote. The second level of the ensemble combines the predictions of all channels of an EEG signal using majority vote for detecting the emotion state. We validated the effectiveness and robustness of Deep-AER using DEAP, a benchmark dataset for emotion recognition research. The results indicate that FRONT plays dominant role in AER and over this region, Deep-AER achieved the accuracies of 98.43% and 97.65% for two AER problems, i.e., high valence vs low valence (HV vs LV) and high arousal vs low arousal (HA vs LA), respectively. The comparison reveals that Deep-AER outperforms the state-of-the-art systems with large margin. The Deep-AER system will be helpful in monitoring for health care and security investigations.

LGApr 30, 2019
An Efficient Intelligent System for the Classification of Electroencephalography (EEG) Brain Signals using Nuclear Features for Human Cognitive Tasks

Emad-ul-Haq Qazi, Muhammad Hussain, Hatim Aboalsamh

Representation and classification of Electroencephalography (EEG) brain signals are critical processes for their analysis in cognitive tasks. Particularly, extraction of discriminative features from raw EEG signals, without any pre-processing, is a challenging task. Motivated by nuclear norm, we observed that there is a significant difference between the variances of EEG signals captured from the same brain region when a subject performs different tasks. This observation lead us to use singular value decomposition for computing dominant variances of EEG signals captured from a certain brain region while performing a certain task and use them as features (nuclear features). A simple and efficient class means based minimum distance classifier (CMMDC) is enough to predict brain states. This approach results in the feature space of significantly small dimension and gives equally good classification results on clean as well as raw data. We validated the effectiveness and robustness of the technique using four datasets of different tasks: fluid intelligence clean data (FICD), fluid intelligence raw data (FIRD), memory recall task (MRT), and eyes open / eyes closed task (EOEC). For each task, we analyzed EEG signals over six (06) different brain regions with 8, 16, 20, 18, 18 and 100 electrodes. The nuclear features from frontal brain region gave the 100% prediction accuracy. The discriminant analysis of the nuclear features has been conducted using intra-class and inter-class variations. Comparisons with the state-of-the-art techniques showed the superiority of the proposed system.

LGApr 30, 2019
Eigen Values Features for the Classification of Brain Signals corresponding to 2D and 3D Educational Contents

Saeed Bamatraf, Muhammad Hussain, Emad-ul-Haq Qazi et al.

In this paper, we have proposed a brain signal classification method, which uses eigenvalues of the covariance matrix as features to classify images (topomaps) created from the brain signals. The signals are recorded during the answering of 2D and 3D questions. The system is used to classify the correct and incorrect answers for both 2D and 3D questions. Using the classification technique, the impacts of 2D and 3D multimedia educational contents on learning, memory retention and recall will be compared. The subjects learn similar 2D and 3D educational contents. Afterwards, subjects are asked 20 multiple-choice questions (MCQs) associated with the contents after thirty minutes (Short-Term Memory) and two months (Long-Term Memory). Eigenvalues features extracted from topomaps images are given to K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) classifiers, in order to identify the states of the brain related to incorrect and correct answers. Excellent accuracies obtained by both classifiers and by applying statistical analysis on the results, no significant difference is indicated between 2D and 3D multimedia educational contents on learning, memory retention and recall in both STM and LTM.

CVApr 30, 2019
Alignment-Free Cross-Sensor Fingerprint Matching based on the Co-Occurrence of Ridge Orientations and Gabor-HoG Descriptor

Helala AlShehri, Muhammad Hussain, Hatim AboAlSamh et al.

The existing automatic fingerprint verification methods are designed to work under the assumption that the same sensor is installed for enrollment and authentication (regular matching). There is a remarkable decrease in efficiency when one type of contact-based sensor is employed for enrolment and another type of contact-based sensor is used for authentication (cross-matching or fingerprint sensor interoperability problem,). The ridge orientation patterns in a fingerprint are invariant to sensor type. Based on this observation, we propose a robust fingerprint descriptor called the co-occurrence of ridge orientations (Co-Ror), which encodes the spatial distribution of ridge orientations. Employing this descriptor, we introduce an efficient automatic fingerprint verification method for cross-matching problem. Further, to enhance the robustness of the method, we incorporate scale based ridge orientation information through Gabor-HoG descriptor. The two descriptors are fused with canonical correlation analysis (CCA), and the matching score between two fingerprints is calculated using city-block distance. The proposed method is alignment-free and can handle the matching process without the need for a registration step. The intensive experiments on two benchmark databases (FingerPass and MOLF) show the effectiveness of the method and reveal its significant enhancement over the state-of-the-art methods such as VeriFinger (a commercial SDK), minutia cylinder-code (MCC), MCC with scale, and the thin-plate spline (TPS) model. The proposed research will help security agencies, service providers and law-enforcement departments to overcome the interoperability problem of contact sensors of different technology and interaction types.

CVNov 3, 2018
Deep Learning based Computer-Aided Diagnosis Systems for Diabetic Retinopathy: A Survey

Norah Asiri, Muhammad Hussain, Fadwa Al Adel et al.

Diabetic retinopathy (DR) results in vision loss if not treated early. A computer-aided diagnosis (CAD) system based on retinal fundus images is an efficient and effective method for early DR diagnosis and assisting experts. A computer-aided diagnosis (CAD) system involves various stages like detection, segmentation and classification of lesions in fundus images. Many traditional machine-learning (ML) techniques based on hand-engineered features have been introduced. The recent emergence of deep learning (DL) and its decisive victory over traditional ML methods for various applications motivated the researchers to employ it for DR diagnosis, and many deep-learning-based methods have been introduced. In this paper, we review these methods, highlighting their pros and cons. In addition, we point out the challenges to be addressed in designing and learning about efficient, effective and robust deep-learning algorithms for various problems in DR diagnosis and draw attention to directions for future research.

CVJan 16, 2018
ConvSRC: SmartPhone based Periocular Recognition using Deep Convolutional Neural Network and Sparsity Augmented Collaborative Representation

Amani Alahmadi, Muhammad Hussain, Hatim Aboalsamh et al.

Smartphone based periocular recognition has gained significant attention from biometric research community because of the limitations of biometric modalities like face, iris etc. Most of the existing methods for periocular recognition employ hand-crafted features. Recently, learning based image representation techniques like deep Convolutional Neural Network (CNN) have shown outstanding performance in many visual recognition tasks. CNN needs a huge volume of data for its learning, but for periocular recognition only limited amount of data is available. The solution is to use CNN pre-trained on the dataset from the related domain, in this case the challenge is to extract efficiently the discriminative features. Using a pertained CNN model (VGG-Net), we propose a simple, efficient and compact image representation technique that takes into account the wealth of information and sparsity existing in the activations of the convolutional layers and employs principle component analysis. For recognition, we use an efficient and robust Sparse Augmented Collaborative Representation based Classification (SA-CRC) technique. For thorough evaluation of ConvSRC (the proposed system), experiments were carried out on the VISOB challenging database which was presented for periocular recognition competition in ICIP2016. The obtained results show the superiority of ConvSRC over the state-of-the-art methods; it obtains a GMR of more than 99% at FMR = 10-3 and outperforms the first winner of ICIP2016 challenge by 10%.

CVJan 16, 2018
An Automated System for Epilepsy Detection using EEG Brain Signals based on Deep Learning Approach

Ihsan Ullah, Muhammad Hussain, Emad-ul-Haq Qazi et al.

Epilepsy is a neurological disorder and for its detection, encephalography (EEG) is a commonly used clinical approach. Manual inspection of EEG brain signals is a time-consuming and laborious process, which puts heavy burden on neurologists and affects their performance. Several automatic techniques have been proposed using traditional approaches to assist neurologists in detecting binary epilepsy scenarios e.g. seizure vs. non-seizure or normal vs. ictal. These methods do not perform well when classifying ternary case e.g. ictal vs. normal vs. inter-ictal; the maximum accuracy for this case by the state-of-the-art-methods is 97+-1%. To overcome this problem, we propose a system based on deep learning, which is an ensemble of pyramidal one-dimensional convolutional neural network (P-1D-CNN) models. In a CNN model, the bottleneck is the large number of learnable parameters. P-1D-CNN works on the concept of refinement approach and it results in 60% fewer parameters compared to traditional CNN models. Further to overcome the limitations of small amount of data, we proposed augmentation schemes for learning P-1D-CNN model. In almost all the cases concerning epilepsy detection, the proposed system gives an accuracy of 99.1+-0.9% on the University of Bonn dataset.