Sunil Aryal

CV
h-index68
51papers
844citations
Novelty35%
AI Score50

51 Papers

CVSep 22, 2024Code
Margin-bounded Confidence Scores for Out-of-Distribution Detection

Lakpa D. Tamang, Mohamed Reda Bouadjenek, Richard Dazeley et al.

In many critical Machine Learning applications, such as autonomous driving and medical image diagnosis, the detection of out-of-distribution (OOD) samples is as crucial as accurately classifying in-distribution (ID) inputs. Recently Outlier Exposure (OE) based methods have shown promising results in detecting OOD inputs via model fine-tuning with auxiliary outlier data. However, most of the previous OE-based approaches emphasize more on synthesizing extra outlier samples or introducing regularization to diversify OOD sample space, which is rather unquantifiable in practice. In this work, we propose a novel and straightforward method called Margin bounded Confidence Scores (MaCS) to address the nontrivial OOD detection problem by enlarging the disparity between ID and OOD scores, which in turn makes the decision boundary more compact facilitating effective segregation with a simple threshold. Specifically, we augment the learning objective of an OE regularized classifier with a supplementary constraint, which penalizes high confidence scores for OOD inputs compared to that of ID and significantly enhances the OOD detection performance while maintaining the ID classification accuracy. Extensive experiments on various benchmark datasets for image classification tasks demonstrate the effectiveness of the proposed method by significantly outperforming state-of-the-art (S.O.T.A) methods on various benchmarking metrics. The code is publicly available at https://github.com/lakpa-tamang9/margin_ood

IVNov 28, 2023
Empowering COVID-19 Detection: Optimizing Performance Through Fine-Tuned EfficientNet Deep Learning Architecture

Md. Alamin Talukder, Md. Abu Layek, Mohsin Kazi et al.

The worldwide COVID-19 pandemic has profoundly influenced the health and everyday experiences of individuals across the planet. It is a highly contagious respiratory disease requiring early and accurate detection to curb its rapid transmission. Initial testing methods primarily revolved around identifying the genetic composition of the coronavirus, exhibiting a relatively low detection rate and requiring a time-intensive procedure. To address this challenge, experts have suggested using radiological imagery, particularly chest X-rays, as a valuable approach within the diagnostic protocol. This study investigates the potential of leveraging radiographic imaging (X-rays) with deep learning algorithms to swiftly and precisely identify COVID-19 patients. The proposed approach elevates the detection accuracy by fine-tuning with appropriate layers on various established transfer learning models. The experimentation was conducted on a COVID-19 X-ray dataset containing 2000 images. The accuracy rates achieved were impressive of 100% for EfficientNetB4 model. The fine-tuned EfficientNetB4 achieved an excellent accuracy score, showcasing its potential as a robust COVID-19 detection model. Furthermore, EfficientNetB4 excelled in identifying Lung disease using Chest X-ray dataset containing 4,350 Images, achieving remarkable performance with an accuracy of 99.17%, precision of 99.13%, recall of 99.16%, and f1-score of 99.14%. These results highlight the promise of fine-tuned transfer learning for efficient lung detection through medical imaging, especially with X-ray images. This research offers radiologists an effective means of aiding rapid and precise COVID-19 diagnosis and contributes valuable assistance for healthcare professionals in accurately identifying affected patients.

LGOct 7, 2022
Elastic Step DQN: A novel multi-step algorithm to alleviate overestimation in Deep QNetworks

Adrian Ly, Richard Dazeley, Peter Vamplew et al.

Deep Q-Networks algorithm (DQN) was the first reinforcement learning algorithm using deep neural network to successfully surpass human level performance in a number of Atari learning environments. However, divergent and unstable behaviour have been long standing issues in DQNs. The unstable behaviour is often characterised by overestimation in the $Q$-values, commonly referred to as the overestimation bias. To address the overestimation bias and the divergent behaviour, a number of heuristic extensions have been proposed. Notably, multi-step updates have been shown to drastically reduce unstable behaviour while improving agent's training performance. However, agents are often highly sensitive to the selection of the multi-step update horizon ($n$), and our empirical experiments show that a poorly chosen static value for $n$ can in many cases lead to worse performance than single-step DQN. Inspired by the success of $n$-step DQN and the effects that multi-step updates have on overestimation bias, this paper proposes a new algorithm that we call `Elastic Step DQN' (ES-DQN). It dynamically varies the step size horizon in multi-step updates based on the similarity of states visited. Our empirical evaluation shows that ES-DQN out-performs $n$-step with fixed $n$ updates, Double DQN and Average DQN in several OpenAI Gym environments while at the same time alleviating the overestimation bias.

CVFeb 2, 2023
SHINE: Deep Learning-Based Accessible Parking Management System

Dhiraj Neupane, Aashish Bhattarai, Sunil Aryal et al.

The ongoing expansion of urban areas facilitated by advancements in science and technology has resulted in a considerable increase in the number of privately owned vehicles worldwide, including in South Korea. However, this gradual increment in the number of vehicles has inevitably led to parking-related issues, including the abuse of disabled parking spaces (hereafter referred to as accessible parking spaces) designated for individuals with disabilities. Traditional license plate recognition (LPR) systems have proven inefficient in addressing such a problem in real-time due to the high frame rate of surveillance cameras, the presence of natural and artificial noise, and variations in lighting and weather conditions that impede detection and recognition by these systems. With the growing concept of parking 4.0, many sensors, IoT and deep learning-based approaches have been applied to automatic LPR and parking management systems. Nonetheless, the studies show a need for a robust and efficient model for managing accessible parking spaces in South Korea. To address this, we have proposed a novel system called, Shine, which uses the deep learning-based object detection algorithm for detecting the vehicle, license plate, and disability badges (referred to as cards, badges, or access badges hereafter) and verifies the rights of the driver to use accessible parking spaces by coordinating with the central server. Our model, which achieves a mean average precision of 92.16%, is expected to address the issue of accessible parking space abuse and contributes significantly towards efficient and effective parking management in urban environments.

LGFeb 25
Learning Rewards, Not Labels: Adversarial Inverse Reinforcement Learning for Machinery Fault Detection

Dhiraj Neupane, Richard Dazeley, Mohamed Reda Bouadjenek et al.

Reinforcement learning (RL) offers significant promise for machinery fault detection (MFD). However, most existing RL-based MFD approaches do not fully exploit RL's sequential decision-making strengths, often treating MFD as a simple guessing game (Contextual Bandits). To bridge this gap, we formulate MFD as an offline inverse reinforcement learning problem, where the agent learns the reward dynamics directly from healthy operational sequences, thereby bypassing the need for manual reward engineering and fault labels. Our framework employs Adversarial Inverse Reinforcement Learning to train a discriminator that distinguishes between normal (expert) and policy-generated transitions. The discriminator's learned reward serves as an anomaly score, indicating deviations from normal operating behaviour. When evaluated on three run-to-failure benchmark datasets (HUMS2023, IMS, and XJTU-SY), the model consistently assigns low anomaly scores to normal samples and high scores to faulty ones, enabling early and robust fault detection. By aligning RL's sequential reasoning with MFD's temporal structure, this work opens a path toward RL-based diagnostics in data-driven industrial settings.

CVFeb 4
DMS2F-HAD: A Dual-branch Mamba-based Spatial-Spectral Fusion Network for Hyperspectral Anomaly Detection

Aayushma Pant, Lakpa Tamang, Tsz-Kwan Lee et al.

Hyperspectral anomaly detection (HAD) aims to identify rare and irregular targets in high-dimensional hyperspectral images (HSIs), which are often noisy and unlabelled data. Existing deep learning methods either fail to capture long-range spectral dependencies (e.g., convolutional neural networks) or suffer from high computational cost (e.g., Transformers). To address these challenges, we propose DMS2F-HAD, a novel dual-branch Mamba-based model. Our architecture utilizes Mamba's linear-time modeling to efficiently learn distinct spatial and spectral features in specialized branches, which are then integrated by a dynamic gated fusion mechanism to enhance anomaly localization. Across fourteen benchmark HSI datasets, our proposed DMS2F-HAD not only achieves a state-of-the-art average AUC of 98.78%, but also demonstrates superior efficiency with an inference speed 4.6 times faster than comparable deep learning methods. The results highlight DMS2FHAD's strong generalization and scalability, positioning it as a strong candidate for practical HAD applications.

CVJul 8, 2023
HUMS2023 Data Challenge Result Submission

Dhiraj Neupane, Lakpa Dorje Tamang, Ngoc Dung Huynh et al.

We implemented a simple method for early detection in this research. The implemented methods are plotting the given mat files and analyzing scalogram images generated by performing Continuous Wavelet Transform (CWT) on the samples. Also, finding the mean, standard deviation (STD), and peak-to-peak (P2P) values from each signal also helped detect faulty signs. We have implemented the autoregressive integrated moving average (ARIMA) method to track the progression.

CVAug 13, 2025Code
TOTNet: Occlusion-Aware Temporal Tracking for Robust Ball Detection in Sports Videos

Hao Xu, Arbind Agrahari Baniya, Sam Wells et al.

Robust ball tracking under occlusion remains a key challenge in sports video analysis, affecting tasks like event detection and officiating. We present TOTNet, a Temporal Occlusion Tracking Network that leverages 3D convolutions, visibility-weighted loss, and occlusion augmentation to improve performance under partial and full occlusions. Developed in collaboration with Paralympics Australia, TOTNet is designed for real-world sports analytics. We introduce TTA, a new occlusion-rich table tennis dataset collected from professional-level Paralympic matches, comprising 9,159 samples with 1,996 occlusion cases. Evaluated on four datasets across tennis, badminton, and table tennis, TOTNet significantly outperforms prior state-of-the-art methods, reducing RMSE from 37.30 to 7.19 and improving accuracy on fully occluded frames from 0.63 to 0.80. These results demonstrate TOTNets effectiveness for offline sports analytics in fast-paced scenarios. Code and data access:\href{https://github.com/AugustRushG/TOTNet}{AugustRushG/TOTNet}.

CLMay 14, 2025Code
A Comprehensive Analysis of Large Language Model Outputs: Similarity, Diversity, and Bias

Brandon Smith, Mohamed Reda Bouadjenek, Tahsin Alamgir Kheya et al.

Large Language Models (LLMs) represent a major step toward artificial general intelligence, significantly advancing our ability to interact with technology. While LLMs perform well on Natural Language Processing tasks -- such as translation, generation, code writing, and summarization -- questions remain about their output similarity, variability, and ethical implications. For instance, how similar are texts generated by the same model? How does this compare across different models? And which models best uphold ethical standards? To investigate, we used 5{,}000 prompts spanning diverse tasks like generation, explanation, and rewriting. This resulted in approximately 3 million texts from 12 LLMs, including proprietary and open-source systems from OpenAI, Google, Microsoft, Meta, and Mistral. Key findings include: (1) outputs from the same LLM are more similar to each other than to human-written texts; (2) models like WizardLM-2-8x22b generate highly similar outputs, while GPT-4 produces more varied responses; (3) LLM writing styles differ significantly, with Llama 3 and Mistral showing higher similarity, and GPT-4 standing out for distinctiveness; (4) differences in vocabulary and tone underscore the linguistic uniqueness of LLM-generated content; (5) some LLMs demonstrate greater gender balance and reduced bias. These results offer new insights into the behavior and diversity of LLM outputs, helping guide future development and ethical evaluation.

LGJan 8, 2025Code
HI-PMK: A Data-Dependent Kernel for Incomplete Heterogeneous Data Representation

Youran Zhou, Mohamed Reda Bouadjenek, Jonathan Wells et al.

Handling incomplete and heterogeneous data remains a central challenge in real-world machine learning, where missing values may follow complex mechanisms (MCAR, MAR, MNAR) and features can be of mixed types (numerical and categorical). Existing methods often rely on imputation, which may introduce bias or privacy risks, or fail to jointly address data heterogeneity and structured missingness. We propose the \textbf{H}eterogeneous \textbf{I}ncomplete \textbf{P}robability \textbf{M}ass \textbf{K}ernel (\textbf{HI-PMK}), a novel data-dependent representation learning approach that eliminates the need for imputation. HI-PMK introduces two key innovations: (1) a probability mass-based dissimilarity measure that adapts to local data distributions across heterogeneous features (numerical, ordinal, nominal), and (2) a missingness-aware uncertainty strategy (MaxU) that conservatively handles all three missingness mechanisms by assigning maximal plausible dissimilarity to unobserved entries. Our approach is privacy-preserving, scalable, and readily applicable to downstream tasks such as classification and clustering. Extensive experiments on over 15 benchmark datasets demonstrate that HI-PMK consistently outperforms traditional imputation-based pipelines and kernel methods across a wide range of missing data settings. Code is available at: https://github.com/echoid/Incomplete-Heter-Kernel

LGAug 6, 2025Code
MissMecha: An All-in-One Python Package for Studying Missing Data Mechanisms

Youran Zhou, Mohamed Reda Bouadjenek, Sunil Aryal

Incomplete data is a persistent challenge in real-world datasets, often governed by complex and unobservable missing mechanisms. Simulating missingness has become a standard approach for understanding its impact on learning and analysis. However, existing tools are fragmented, mechanism-limited, and typically focus only on numerical variables, overlooking the heterogeneous nature of real-world tabular data. We present MissMecha, an open-source Python toolkit for simulating, visualizing, and evaluating missing data under MCAR, MAR, and MNAR assumptions. MissMecha supports both numerical and categorical features, enabling mechanism-aware studies across mixed-type tabular datasets. It includes visual diagnostics, MCAR testing utilities, and type-aware imputation evaluation metrics. Designed to support data quality research, benchmarking, and education,MissMecha offers a unified platform for researchers and practitioners working with incomplete data.

CRFeb 17, 2024
MLSTL-WSN: Machine Learning-based Intrusion Detection using SMOTETomek in WSNs

Md. Alamin Talukder, Selina Sharmin, Md Ashraf Uddin et al.

Wireless Sensor Networks (WSNs) play a pivotal role as infrastructures, encompassing both stationary and mobile sensors. These sensors self-organize and establish multi-hop connections for communication, collectively sensing, gathering, processing, and transmitting data about their surroundings. Despite their significance, WSNs face rapid and detrimental attacks that can disrupt functionality. Existing intrusion detection methods for WSNs encounter challenges such as low detection rates, computational overhead, and false alarms. These issues stem from sensor node resource constraints, data redundancy, and high correlation within the network. To address these challenges, we propose an innovative intrusion detection approach that integrates Machine Learning (ML) techniques with the Synthetic Minority Oversampling Technique Tomek Link (SMOTE-TomekLink) algorithm. This blend synthesizes minority instances and eliminates Tomek links, resulting in a balanced dataset that significantly enhances detection accuracy in WSNs. Additionally, we incorporate feature scaling through standardization to render input features consistent and scalable, facilitating more precise training and detection. To counteract imbalanced WSN datasets, we employ the SMOTE-Tomek resampling technique, mitigating overfitting and underfitting issues. Our comprehensive evaluation, using the WSN Dataset (WSN-DS) containing 374,661 records, identifies the optimal model for intrusion detection in WSNs. The standout outcome of our research is the remarkable performance of our model. In binary, it achieves an accuracy rate of 99.78% and in multiclass, it attains an exceptional accuracy rate of 99.92%. These findings underscore the efficiency and superiority of our proposal in the context of WSN intrusion detection, showcasing its effectiveness in detecting and mitigating intrusions in WSNs.

MMJun 3, 2025
Omnidirectional Video Super-Resolution using Deep Learning

Arbind Agrahari Baniya, Tsz-Kwan Lee, Peter W. Eklund et al.

Omnidirectional Videos (or 360° videos) are widely used in Virtual Reality (VR) to facilitate immersive and interactive viewing experiences. However, the limited spatial resolution in 360° videos does not allow for each degree of view to be represented with adequate pixels, limiting the visual quality offered in the immersive experience. Deep learning Video Super-Resolution (VSR) techniques used for conventional videos could provide a promising software-based solution; however, these techniques do not tackle the distortion present in equirectangular projections of 360° video signals. An additional obstacle is the limited availability of 360° video datasets for study. To address these issues, this paper creates a novel 360° Video Dataset (360VDS) with a study of the extensibility of conventional VSR models to 360° videos. This paper further proposes a novel deep learning model for 360° Video Super-Resolution (360° VSR), called Spherical Signal Super-resolution with a Proportioned Optimisation (S3PO). S3PO adopts recurrent modelling with an attention mechanism, unbound from conventional VSR techniques like alignment. With a purpose-built feature extractor and a novel loss function addressing spherical distortion, S3PO outperforms most state-of-the-art conventional VSR models and 360°~specific super-resolution models on 360° video datasets. A step-wise ablation study is presented to understand and demonstrate the impact of the chosen architectural sub-components, targeted training and optimisation.

LGMar 5, 2024
Training Machine Learning models at the Edge: A Survey

Aymen Rayane Khouas, Mohamed Reda Bouadjenek, Hakim Hacid et al.

Edge computing has gained significant traction in recent years, promising enhanced efficiency by integrating artificial intelligence capabilities at the edge. While the focus has primarily been on the deployment and inference of Machine Learning (ML) models at the edge, the training aspect remains less explored. This survey, explores the concept of edge learning, specifically the optimization of ML model training at the edge. The objective is to comprehensively explore diverse approaches and methodologies in edge learning, synthesize existing knowledge, identify challenges, and highlight future trends. Utilizing Scopus and Web of science advanced search, relevant literature on edge learning was identified, revealing a concentration of research efforts in distributed learning methods, particularly federated learning. This survey further provides a guideline for comparing techniques used to optimize ML for edge learning, along with an exploration of the different frameworks, libraries, and simulation tools available. In doing so, the paper contributes to a holistic understanding of the current landscape and future directions in the intersection of edge computing and machine learning, paving the way for informed comparisons between optimization methods and techniques designed for training on the edge.

AIMar 26, 2024
The Pursuit of Fairness in Artificial Intelligence Models: A Survey

Tahsin Alamgir Kheya, Mohamed Reda Bouadjenek, Sunil Aryal

Artificial Intelligence (AI) models are now being utilized in all facets of our lives such as healthcare, education and employment. Since they are used in numerous sensitive environments and make decisions that can be life altering, potential biased outcomes are a pressing matter. Developers should ensure that such models don't manifest any unexpected discriminatory practices like partiality for certain genders, ethnicities or disabled people. With the ubiquitous dissemination of AI systems, researchers and practitioners are becoming more aware of unfair models and are bound to mitigate bias in them. Significant research has been conducted in addressing such issues to ensure models don't intentionally or unintentionally perpetuate bias. This survey offers a synopsis of the different ways researchers have promoted fairness in AI systems. We explore the different definitions of fairness existing in the current literature. We create a comprehensive taxonomy by categorizing different types of bias and investigate cases of biased AI in different application domains. A thorough study is conducted of the approaches and techniques employed by researchers to mitigate bias in AI models. Moreover, we also delve into the impact of biased models on user experience and the ethical considerations to contemplate when developing and deploying such models. We hope this survey helps researchers and practitioners understand the intricate details of fairness and bias in AI systems. By sharing this thorough survey, we aim to promote additional discourse in the domain of equitable and responsible AI.

MEApr 7, 2024
Review for Handling Missing Data with special missing mechanism

Youran Zhou, Sunil Aryal, Mohamed Reda Bouadjenek

Missing data poses a significant challenge in data science, affecting decision-making processes and outcomes. Understanding what missing data is, how it occurs, and why it is crucial to handle it appropriately is paramount when working with real-world data, especially in tabular data, one of the most commonly used data types in the real world. Three missing mechanisms are defined in the literature: Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), each presenting unique challenges in imputation. Most existing work are focused on MCAR that is relatively easy to handle. The special missing mechanisms of MNAR and MAR are less explored and understood. This article reviews existing literature on handling missing values. It compares and contrasts existing methods in terms of their ability to handle different missing mechanisms and data types. It identifies research gap in the existing literature and lays out potential directions for future research in the field. The information in this review will help data analysts and researchers to adopt and promote good practices for handling missing data in real-world problems.

CRMar 17, 2024
usfAD Based Effective Unknown Attack Detection Focused IDS Framework

Md. Ashraf Uddin, Sunil Aryal, Mohamed Reda Bouadjenek et al.

The rapid expansion of varied network systems, including the Internet of Things (IoT) and Industrial Internet of Things (IIoT), has led to an increasing range of cyber threats. Ensuring robust protection against these threats necessitates the implementation of an effective Intrusion Detection System (IDS). For more than a decade, researchers have delved into supervised machine learning techniques to develop IDS to classify normal and attack traffic. However, building effective IDS models using supervised learning requires a substantial number of benign and attack samples. To collect a sufficient number of attack samples from real-life scenarios is not possible since cyber attacks occur occasionally. Further, IDS trained and tested on known datasets fails in detecting zero-day or unknown attacks due to the swift evolution of attack patterns. To address this challenge, we put forth two strategies for semi-supervised learning based IDS where training samples of attacks are not required: 1) training a supervised machine learning model using randomly and uniformly dispersed synthetic attack samples; 2) building a One Class Classification (OCC) model that is trained exclusively on benign network traffic. We have implemented both approaches and compared their performances using 10 recent benchmark IDS datasets. Our findings demonstrate that the OCC model based on the state-of-art anomaly detection technique called usfAD significantly outperforms conventional supervised classification and other OCC based techniques when trained and tested considering real-life scenarios, particularly to detect previously unseen attacks.

IVJun 3, 2025
A Survey of Deep Learning Video Super-Resolution

Arbind Agrahari Baniya, Tsz-Kwan Lee, Peter Eklund et al.

Video super-resolution (VSR) is a prominent research topic in low-level computer vision, where deep learning technologies have played a significant role. The rapid progress in deep learning and its applications in VSR has led to a proliferation of tools and techniques in the literature. However, the usage of these methods is often not adequately explained, and decisions are primarily driven by quantitative improvements. Given the significance of VSR's potential influence across multiple domains, it is imperative to conduct a comprehensive analysis of the elements and deep learning methodologies employed in VSR research. This methodical analysis will facilitate the informed development of models tailored to specific application needs. In this paper, we present an overarching overview of deep learning-based video super-resolution models, investigating each component and discussing its implications. Furthermore, we provide a synopsis of key components and technologies employed by state-of-the-art and earlier VSR models. By elucidating the underlying methodologies and categorising them systematically, we identified trends, requirements, and challenges in the domain. As a first-of-its-kind survey of deep learning-based VSR models, this work also establishes a multi-level taxonomy to guide current and future VSR research, enhancing the maturation and interpretation of VSR practices for various practical applications.

CVJan 7, 2025
Visual question answering: from early developments to recent advances -- a survey

Ngoc Dung Huynh, Mohamed Reda Bouadjenek, Sunil Aryal et al.

Visual Question Answering (VQA) is an evolving research field aimed at enabling machines to answer questions about visual content by integrating image and language processing techniques such as feature extraction, object detection, text embedding, natural language understanding, and language generation. With the growth of multimodal data research, VQA has gained significant attention due to its broad applications, including interactive educational tools, medical image diagnosis, customer service, entertainment, and social media captioning. Additionally, VQA plays a vital role in assisting visually impaired individuals by generating descriptive content from images. This survey introduces a taxonomy of VQA architectures, categorizing them based on design choices and key components to facilitate comparative analysis and evaluation. We review major VQA approaches, focusing on deep learning-based methods, and explore the emerging field of Large Visual Language Models (LVLMs) that have demonstrated success in multimodal tasks like VQA. The paper further examines available datasets and evaluation metrics essential for measuring VQA system performance, followed by an exploration of real-world VQA applications. Finally, we highlight ongoing challenges and future directions in VQA research, presenting open questions and potential areas for further development. This survey serves as a comprehensive resource for researchers and practitioners interested in the latest advancements and future

CRMar 17, 2024
A Dual-Tier Adaptive One-Class Classification IDS for Emerging Cyberthreats

Md. Ashraf Uddin, Sunil Aryal, Mohamed Reda Bouadjenek et al.

In today's digital age, our dependence on IoT (Internet of Things) and IIoT (Industrial IoT) systems has grown immensely, which facilitates sensitive activities such as banking transactions and personal, enterprise data, and legal document exchanges. Cyberattackers consistently exploit weak security measures and tools. The Network Intrusion Detection System (IDS) acts as a primary tool against such cyber threats. However, machine learning-based IDSs, when trained on specific attack patterns, often misclassify new emerging cyberattacks. Further, the limited availability of attack instances for training a supervised learner and the ever-evolving nature of cyber threats further complicate the matter. This emphasizes the need for an adaptable IDS framework capable of recognizing and learning from unfamiliar/unseen attacks over time. In this research, we propose a one-class classification-driven IDS system structured on two tiers. The first tier distinguishes between normal activities and attacks/threats, while the second tier determines if the detected attack is known or unknown. Within this second tier, we also embed a multi-classification mechanism coupled with a clustering algorithm. This model not only identifies unseen attacks but also uses them for retraining them by clustering unseen attacks. This enables our model to be future-proofed, capable of evolving with emerging threat patterns. Leveraging one-class classifiers (OCC) at the first level, our approach bypasses the need for attack samples, addressing data imbalance and zero-day attack concerns and OCC at the second level can effectively separate unknown attacks from the known attacks. Our methodology and evaluations indicate that the presented framework exhibits promising potential for real-world deployments.

CRMar 17, 2024
Hierarchical Classification for Intrusion Detection System: Effective Design and Empirical Analysis

Md. Ashraf Uddin, Sunil Aryal, Mohamed Reda Bouadjenek et al.

With the increased use of network technologies like Internet of Things (IoT) in many real-world applications, new types of cyberattacks have been emerging. To safeguard critical infrastructures from these emerging threats, it is crucial to deploy an Intrusion Detection System (IDS) that can detect different types of attacks accurately while minimizing false alarms. Machine learning approaches have been used extensively in IDS and they are mainly using flat multi-class classification to differentiate normal traffic and different types of attacks. Though cyberattack types exhibit a hierarchical structure where similar granular attack subtypes can be grouped into more high-level attack types, hierarchical classification approach has not been explored well. In this paper, we investigate the effectiveness of hierarchical classification approach in IDS. We use a three-level hierarchical classification model to classify various network attacks, where the first level classifies benign or attack, the second level classifies coarse high-level attack types, and the third level classifies a granular level attack types. Our empirical results of using 10 different classification algorithms in 10 different datasets show that there is no significant difference in terms of overall classification performance (i.e., detecting normal and different types of attack correctly) of hierarchical and flat classification approaches. However, flat classification approach misclassify attacks as normal whereas hierarchical approach misclassify one type of attack as another attack type. In other words, the hierarchical classification approach significantly minimises attacks from misclassified as normal traffic, which is more important in critical systems.

GNFeb 27, 2024
Exploring Gene Regulatory Interaction Networks and predicting therapeutic molecules for Hypopharyngeal Cancer and EGFR-mutated lung adenocarcinoma

Abanti Bhattacharjya, Md Manowarul Islam, Md Ashraf Uddin et al.

With the advent of Information technology, the Bioinformatics research field is becoming increasingly attractive to researchers and academicians. The recent development of various Bioinformatics toolkits has facilitated the rapid processing and analysis of vast quantities of biological data for human perception. Most studies focus on locating two connected diseases and making some observations to construct diverse gene regulatory interaction networks, a forerunner to general drug design for curing illness. For instance, Hypopharyngeal cancer is a disease that is associated with EGFR-mutated lung adenocarcinoma. In this study, we select EGFR-mutated lung adenocarcinoma and Hypopharyngeal cancer by finding the Lung metastases in hypopharyngeal cancer. To conduct this study, we collect Mircorarray datasets from GEO (Gene Expression Omnibus), an online database controlled by NCBI. Differentially expressed genes, common genes, and hub genes between the selected two diseases are detected for the succeeding move. Our research findings have suggested common therapeutic molecules for the selected diseases based on 10 hub genes with the highest interactions according to the degree topology method and the maximum clique centrality (MCC). Our suggested therapeutic molecules will be fruitful for patients with those two diseases simultaneously.

AIFeb 25, 2025
Unmasking Gender Bias in Recommendation Systems and Enhancing Category-Aware Fairness

Tahsin Alamgir Kheya, Mohamed Reda Bouadjenek, Sunil Aryal

Recommendation systems are now an integral part of our daily lives. We rely on them for tasks such as discovering new movies, finding friends on social media, and connecting job seekers with relevant opportunities. Given their vital role, we must ensure these recommendations are free from societal stereotypes. Therefore, evaluating and addressing such biases in recommendation systems is crucial. Previous work evaluating the fairness of recommended items fails to capture certain nuances as they mainly focus on comparing performance metrics for different sensitive groups. In this paper, we introduce a set of comprehensive metrics for quantifying gender bias in recommendations. Specifically, we show the importance of evaluating fairness on a more granular level, which can be achieved using our metrics to capture gender bias using categories of recommended items like genres for movies. Furthermore, we show that employing a category-aware fairness metric as a regularization term along with the main recommendation loss during training can help effectively minimize bias in the models' output. We experiment on three real-world datasets, using five baseline models alongside two popular fairness-aware models, to show the effectiveness of our metrics in evaluating gender bias. Our metrics help provide an enhanced insight into bias in recommended items compared to previous metrics. Additionally, our results demonstrate how incorporating our regularization term significantly improves the fairness in recommendations for different categories without substantial degradation in overall recommendation performance.

LGJul 25, 2025
Handling Out-of-Distribution Data: A Survey

Lakpa Tamang, Mohamed Reda Bouadjenek, Richard Dazeley et al.

In the field of Machine Learning (ML) and data-driven applications, one of the significant challenge is the change in data distribution between the training and deployment stages, commonly known as distribution shift. This paper outlines different mechanisms for handling two main types of distribution shifts: (i) Covariate shift: where the value of features or covariates change between train and test data, and (ii) Concept/Semantic-shift: where model experiences shift in the concept learned during training due to emergence of novel classes in the test phase. We sum up our contributions in three folds. First, we formalize distribution shifts, recite on how the conventional method fails to handle them adequately and urge for a model that can simultaneously perform better in all types of distribution shifts. Second, we discuss why handling distribution shifts is important and provide an extensive review of the methods and techniques that have been developed to detect, measure, and mitigate the effects of these shifts. Third, we discuss the current state of distribution shift handling mechanisms and propose future research directions in this area. Overall, we provide a retrospective synopsis of the literature in the distribution shift, focusing on OOD data that had been overlooked in the existing surveys.

CVMay 6, 2025
Deep Learning for Sports Video Event Detection: Tasks, Datasets, Methods, and Challenges

Hao Xu, Arbind Agrahari Baniya, Sam Well et al.

Video event detection has become a cornerstone of modern sports analytics, powering automated performance evaluation, content generation, and tactical decision-making. Recent advances in deep learning have driven progress in related tasks such as Temporal Action Localization (TAL), which detects extended action segments; Action Spotting (AS), which identifies a representative timestamp; and Precise Event Spotting (PES), which pinpoints the exact frame of an event. Although closely connected, their subtle differences often blur the boundaries between them, leading to confusion in both research and practical applications. Furthermore, prior surveys either address generic video event detection or broader sports video tasks, but largely overlook the unique temporal granularity and domain-specific challenges of event spotting. In addition, most existing sports video surveys focus on elite-level competitions while neglecting the wider community of everyday practitioners. This survey addresses these gaps by: (i) clearly delineating TAL, AS, and PES and their respective use cases; (ii) introducing a structured taxonomy of state of the art approaches including temporal modeling strategies, multimodal frameworks, and data-efficient pipelines tailored for AS and PES; and (iii) critically assessing benchmark datasets and evaluation protocols, highlighting limitations such as reliance on broadcast quality footage and metrics that over reward permissive multilabel predictions. By synthesizing current research and exposing open challenges, this work provides a comprehensive foundation for developing temporally precise, generalizable, and practically deployable sports event detection systems for both the research and industry communities.

AIAug 5, 2025
MissDDIM: Deterministic and Efficient Conditional Diffusion for Tabular Data Imputation

Youran Zhou, Mohamed Reda Bouadjenek, Sunil Aryal

Diffusion models have recently emerged as powerful tools for missing data imputation by modeling the joint distribution of observed and unobserved variables. However, existing methods, typically based on stochastic denoising diffusion probabilistic models (DDPMs), suffer from high inference latency and variable outputs, limiting their applicability in real-world tabular settings. To address these deficiencies, we present in this paper MissDDIM, a conditional diffusion framework that adapts Denoising Diffusion Implicit Models (DDIM) for tabular imputation. While stochastic sampling enables diverse completions, it also introduces output variability that complicates downstream processing.

CVJul 10, 2025
Multi-Scale Attention and Gated Shifting for Fine-Grained Event Spotting in Videos

Hao Xu, Sam Wells, Mohamed Reda Bouadjenek et al.

Precise Event Spotting (PES) in sports videos requires frame-level recognition of fine-grained actions from single-camera footage. Existing PES models typically incorporate lightweight temporal modules such as the Gate Shift Module (GSM) or the Gate Shift Fuse to enrich 2D CNN feature extractors with temporal context. However, these modules are limited in both temporal receptive field and spatial adaptability. We propose a Multi-Scale Attention Gate Shift Module (MSAGSM) that enhances GSM with multi-scale temporal shifts and channel grouped spatial attention, enabling efficient modeling of both short and long-term dependencies while focusing on salient regions. MSAGSM is a lightweight, plug-and-play module that integrates seamlessly with diverse 2D backbones. To further advance the field, we introduce the Table Tennis Australia dataset, the first PES benchmark for table tennis containing over 4,800 precisely annotated events. Extensive experiments across four PES benchmarks demonstrate that MSAGSM consistently improves performance with minimal overhead, setting new state-of-the-art results.

LGFeb 27, 2025
Developing robust methods to handle missing data in real-world applications effectively

Youran Zhou, Mohamed Reda Bouadjenek, Sunil Aryal

Missing data is a pervasive challenge spanning diverse data types, including tabular, sensor data, time-series, images and so on. Its origins are multifaceted, resulting in various missing mechanisms. Prior research in this field has predominantly revolved around the assumption of the Missing Completely At Random (MCAR) mechanism. However, Missing At Random (MAR) and Missing Not At Random (MNAR) mechanisms, though equally prevalent, have often remained underexplored despite their significant influence. This PhD project presents a comprehensive research agenda designed to investigate the implications of diverse missing data mechanisms. The principal aim is to devise robust methodologies capable of effectively handling missing data while accommodating the unique characteristics of MCAR, MAR, and MNAR mechanisms. By addressing these gaps, this research contributes to an enriched understanding of the challenges posed by missing data across various industries and data modalities. It seeks to provide practical solutions that enable the effective management of missing data, empowering researchers and practitioners to leverage incomplete datasets confidently.

LGNov 18, 2025
MissHDD: Hybrid Deterministic Diffusion for Hetrogeneous Incomplete Data Imputation

Youran Zhou, Mohamed Reda Bouadjenek, Sunil Aryal

Incomplete data are common in real-world tabular applications, where numerical, categorical, and discrete attributes coexist within a single dataset. This heterogeneous structure presents significant challenges for existing diffusion-based imputation models, which typically assume a homogeneous feature space and rely on stochastic denoising trajectories. Such assumptions make it difficult to maintain conditional consistency, and they often lead to information collapse for categorical variables or instability when numerical variables require deterministic updates. These limitations indicate that a single diffusion process is insufficient for mixed-type tabular imputation. We propose a hybrid deterministic diffusion framework that separates heterogeneous features into two complementary generative channels. A continuous DDIM-based channel provides efficient and stable deterministic denoising for numerical variables, while a discrete latent-path diffusion channel, inspired by loopholing-based discrete diffusion, models categorical and discrete features without leaving their valid sample manifolds. The two channels are trained under a unified conditional imputation objective, enabling coherent reconstruction of mixed-type incomplete data. Extensive experiments on multiple real-world datasets show that the proposed framework achieves higher imputation accuracy, more stable sampling trajectories, and improved robustness across MCAR, MAR, and MNAR settings compared with existing diffusion-based and classical methods. These results demonstrate the importance of structure-aware diffusion processes for advancing deep learning approaches to incomplete tabular data.

SDSep 9, 2025
Spectral Masking and Interpolation Attack (SMIA): A Black-box Adversarial Attack against Voice Authentication and Anti-Spoofing Systems

Kamel Kamel, Hridoy Sankar Dutta, Keshav Sood et al.

Voice Authentication Systems (VAS) use unique vocal characteristics for verification. They are increasingly integrated into high-security sectors such as banking and healthcare. Despite their improvements using deep learning, they face severe vulnerabilities from sophisticated threats like deepfakes and adversarial attacks. The emergence of realistic voice cloning complicates detection, as systems struggle to distinguish authentic from synthetic audio. While anti-spoofing countermeasures (CMs) exist to mitigate these risks, many rely on static detection models that can be bypassed by novel adversarial methods, leaving a critical security gap. To demonstrate this vulnerability, we propose the Spectral Masking and Interpolation Attack (SMIA), a novel method that strategically manipulates inaudible frequency regions of AI-generated audio. By altering the voice in imperceptible zones to the human ear, SMIA creates adversarial samples that sound authentic while deceiving CMs. We conducted a comprehensive evaluation of our attack against state-of-the-art (SOTA) models across multiple tasks, under simulated real-world conditions. SMIA achieved a strong attack success rate (ASR) of at least 82% against combined VAS/CM systems, at least 97.5% against standalone speaker verification systems, and 100% against countermeasures. These findings conclusively demonstrate that current security postures are insufficient against adaptive adversarial attacks. This work highlights the urgent need for a paradigm shift toward next-generation defenses that employ dynamic, context-aware frameworks capable of evolving with the threat landscape.

CRAug 22, 2025
A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems

Kamel Kamel, Keshav Sood, Hridoy Sankar Dutta et al.

Voice authentication has undergone significant changes from traditional systems that relied on handcrafted acoustic features to deep learning models that can extract robust speaker embeddings. This advancement has expanded its applications across finance, smart devices, law enforcement, and beyond. However, as adoption has grown, so have the threats. This survey presents a comprehensive review of the modern threat landscape targeting Voice Authentication Systems (VAS) and Anti-Spoofing Countermeasures (CMs), including data poisoning, adversarial, deepfake, and adversarial spoofing attacks. We chronologically trace the development of voice authentication and examine how vulnerabilities have evolved in tandem with technological advancements. For each category of attack, we summarize methodologies, highlight commonly used datasets, compare performance and limitations, and organize existing literature using widely accepted taxonomies. By highlighting emerging risks and open challenges, this survey aims to support the development of more secure and resilient voice authentication systems.

CVJul 8, 2025
Hyperspectral Anomaly Detection Methods: A Survey and Comparative Study

Aayushma Pant, Arbind Agrahari Baniya, Tsz-Kwan Lee et al.

Hyperspectral images are high-dimensional datasets comprising hundreds of contiguous spectral bands, enabling detailed analysis of materials and surfaces. Hyperspectral anomaly detection (HAD) refers to the technique of identifying and locating anomalous targets in such data without prior information about a hyperspectral scene or target spectrum. This technology has seen rapid advancements in recent years, with applications in agriculture, defence, military surveillance, and environmental monitoring. Despite this significant progress, existing HAD methods continue to face challenges such as high computational complexity, sensitivity to noise, and limited generalisation across diverse datasets. This study presents a comprehensive comparison of various HAD techniques, categorising them into statistical models, representation-based methods, classical machine learning approaches, and deep learning models. We evaluated these methods across 17 benchmarking datasets using different performance metrics, such as ROC, AUC, and separability map to analyse detection accuracy, computational efficiency, their strengths, limitations, and directions for future research. Our findings highlight that deep learning models achieved the highest detection accuracy, while statistical models demonstrated exceptional speed across all datasets. This survey aims to provide valuable insights for researchers and practitioners working to advance the field of hyperspectral anomaly detection methods.

LGJul 2, 2025
Far From Sight, Far From Mind: Inverse Distance Weighting for Graph Federated Recommendation

Aymen Rayane Khouas, Mohamed Reda Bouadjenek, Hakim Hacid et al.

Graph federated recommendation systems offer a privacy-preserving alternative to traditional centralized recommendation architectures, which often raise concerns about data security. While federated learning enables personalized recommendations without exposing raw user data, existing aggregation methods overlook the unique properties of user embeddings in this setting. Indeed, traditional aggregation methods fail to account for their complexity and the critical role of user similarity in recommendation effectiveness. Moreover, evolving user interactions require adaptive aggregation while preserving the influence of high-relevance anchor users (the primary users before expansion in graph-based frameworks). To address these limitations, we introduce Dist-FedAvg, a novel distance-based aggregation method designed to enhance personalization and aggregation efficiency in graph federated learning. Our method assigns higher aggregation weights to users with similar embeddings, while ensuring that anchor users retain significant influence in local updates. Empirical evaluations on multiple datasets demonstrate that Dist-FedAvg consistently outperforms baseline aggregation techniques, improving recommendation accuracy while maintaining seamless integration into existing federated learning frameworks.

IRJun 23, 2025
Bias vs Bias -- Dawn of Justice: A Fair Fight in Recommendation Systems

Tahsin Alamgir Kheya, Mohamed Reda Bouadjenek, Sunil Aryal

Recommendation systems play a crucial role in our daily lives by impacting user experience across various domains, including e-commerce, job advertisements, entertainment, etc. Given the vital role of such systems in our lives, practitioners must ensure they do not produce unfair and imbalanced recommendations. Previous work addressing bias in recommendations overlooked bias in certain item categories, potentially leaving some biases unaddressed. Additionally, most previous work on fair re-ranking focused on binary-sensitive attributes. In this paper, we address these issues by proposing a fairness-aware re-ranking approach that helps mitigate bias in different categories of items. This re-ranking approach leverages existing biases to correct disparities in recommendations across various demographic groups. We show how our approach can mitigate bias on multiple sensitive attributes, including gender, age, and occupation. We experimented on three real-world datasets to evaluate the effectiveness of our re-ranking scheme in mitigating bias in recommendations. Our results show how this approach helps mitigate social bias with little to no degradation in performance.

LGMay 26, 2025
Rolling Ball Optimizer: Learning by ironing out loss landscape wrinkles

Mohammed Djameleddine Belgoumri, Mohamed Reda Bouadjenek, Hakim Hacid et al.

Training large neural networks (NNs) requires optimizing high-dimensional data-dependent loss functions. The optimization landscape of these functions is often highly complex and textured, even fractal-like, with many spurious local minima, ill-conditioned valleys, degenerate points, and saddle points. Complicating things further is the fact that these landscape characteristics are a function of the data, meaning that noise in the training data can propagate forward and give rise to unrepresentative small-scale geometry. This poses a difficulty for gradient-based optimization methods, which rely on local geometry to compute updates and are, therefore, vulnerable to being derailed by noisy data. In practice,this translates to a strong dependence of the optimization dynamics on the noise in the data, i.e., poor generalization performance. To remediate this problem, we propose a new optimization procedure: Rolling Ball Optimizer (RBO), that breaks this spatial locality by incorporating information from a larger region of the loss landscape in its updates. We achieve this by simulating the motion of a rigid sphere of finite radius rolling on the loss landscape, a straightforward generalization of Gradient Descent (GD) that simplifies into it in the infinitesimal limit. The radius serves as a hyperparameter that determines the scale at which RBO sees the loss landscape, allowing control over the granularity of its interaction therewith. We are motivated by the intuition that the large-scale geometry of the loss landscape is less data-specific than its fine-grained structure, and that it is easier to optimize. We support this intuition by proving that our algorithm has a smoothing effect on the loss function. Evaluation against SGD, SAM, and Entropy-SGD, on MNIST and CIFAR-10/100 demonstrates promising results in terms of convergence speed, training accuracy, and generalization performance.

CVOct 30, 2024
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset

Ngoc Dung Huynh, Mohamed Reda Bouadjenek, Sunil Aryal et al.

Visual Question Answering (VQA) has emerged as a promising area of research to develop AI-based systems for enabling interactive and immersive learning. Numerous VQA datasets have been introduced to facilitate various tasks, such as answering questions or identifying unanswerable ones. However, most of these datasets are constructed using real-world images, leaving the performance of existing models on cartoon images largely unexplored. Hence, in this paper, we present "SimpsonsVQA", a novel dataset for VQA derived from The Simpsons TV show, designed to promote inquiry-based learning. Our dataset is specifically designed to address not only the traditional VQA task but also to identify irrelevant questions related to images, as well as the reverse scenario where a user provides an answer to a question that the system must evaluate (e.g., as correct, incorrect, or ambiguous). It aims to cater to various visual applications, harnessing the visual content of "The Simpsons" to create engaging and informative interactive systems. SimpsonsVQA contains approximately 23K images, 166K QA pairs, and 500K judgments (https://simpsonsvqa.org). Our experiments show that current large vision-language models like ChatGPT4o underperform in zero-shot settings across all three tasks, highlighting the dataset's value for improving model performance on cartoon images. We anticipate that SimpsonsVQA will inspire further research, innovation, and advancements in inquiry-based learning VQA.

LGJun 1, 2024
Data Quality in Edge Machine Learning: A State-of-the-Art Survey

Mohammed Djameleddine Belgoumri, Mohamed Reda Bouadjenek, Sunil Aryal et al.

Data-driven Artificial Intelligence (AI) systems trained using Machine Learning (ML) are shaping an ever-increasing (in size and importance) portion of our lives, including, but not limited to, recommendation systems, autonomous driving technologies, healthcare diagnostics, financial services, and personalized marketing. On the one hand, the outsized influence of these systems imposes a high standard of quality, particularly in the data used to train them. On the other hand, establishing and maintaining standards of Data Quality (DQ) becomes more challenging due to the proliferation of Edge Computing and Internet of Things devices, along with their increasing adoption for training and deploying ML models. The nature of the edge environment -- characterized by limited resources, decentralized data storage, and processing -- exacerbates data-related issues, making them more frequent, severe, and difficult to detect and mitigate. From these observations, it follows that DQ research for edge ML is a critical and urgent exploration track for the safety and robust usefulness of present and future AI systems. Despite this fact, DQ research for edge ML is still in its infancy. The literature on this subject remains fragmented and scattered across different research communities, with no comprehensive survey to date. Hence, this paper aims to fill this gap by providing a global view of the existing literature from multiple disciplines that can be grouped under the umbrella of DQ for edge ML. Specifically, we present a tentative definition of data quality in Edge computing, which we use to establish a set of DQ dimensions. We explore each dimension in detail, including existing solutions for mitigation.

LGJan 21, 2024
Enabling clustering algorithms to detect clusters of varying densities through scale-invariant data preprocessing

Sunil Aryal, Jonathan R. Wells, Arbind Agrahari Baniya et al.

In this paper, we show that preprocessing data using a variant of rank transformation called 'Average Rank over an Ensemble of Sub-samples (ARES)' makes clustering algorithms robust to data representation and enable them to detect varying density clusters. Our empirical results, obtained using three most widely used clustering algorithms-namely KMeans, DBSCAN, and DP (Density Peak)-across a wide range of real-world datasets, show that clustering after ARES transformation produces better and more consistent results.

LGNov 8, 2021
A Novel Data Pre-processing Technique: Making Data Mining Robust to Different Units and Scales of Measurement

Arbind Agrahari Baniya, Sunil Aryal, Santosh KC

Many existing data mining algorithms use feature values directly in their model, making them sensitive to units/scales used to measure/represent data. Pre-processing of data based on rank transformation has been suggested as a potential solution to overcome this issue. However, the resulting data after pre-processing with rank transformation is uniformly distributed, which may not be very useful in many data mining applications. In this paper, we present a better and effective alternative based on ranks over multiple sub-samples of data. We call the proposed pre-processing technique as ARES | Average Rank over an Ensemble of Sub-samples. Our empirical results of widely used data mining algorithms for classification and anomaly detection in a wide range of data sets suggest that ARES results in more consistent task specific? outcome across various algorithms and data sets. In addition to this, it results in better or competitive outcome most of the time compared to the most widely used min-max normalisation and the traditional rank transformation.

AIJul 7, 2021
Levels of explainable artificial intelligence for human-aligned conversational explanations

Richard Dazeley, Peter Vamplew, Cameron Foale et al.

Over the last few years there has been rapid research growth into eXplainable Artificial Intelligence (XAI) and the closely aligned Interpretable Machine Learning (IML). Drivers for this growth include recent legislative changes and increased investments by industry and governments, along with increased concern from the general public. People are affected by autonomous decisions every day and the public need to understand the decision-making process to accept the outcomes. However, the vast majority of the applications of XAI/IML are focused on providing low-level `narrow' explanations of how an individual decision was reached based on a particular datum. While important, these explanations rarely provide insights into an agent's: beliefs and motivations; hypotheses of other (human, animal or AI) agents' intentions; interpretation of external cultural expectations; or, processes used to generate its own explanation. Yet all of these factors, we propose, are essential to providing the explanatory depth that people require to accept and trust the AI's decision-making. This paper aims to define levels of explanation and describe how they can be integrated to create a human-aligned conversational explanation system. In so doing, this paper will survey current approaches and discuss the integration of different technologies to achieve these levels with Broad eXplainable Artificial Intelligence (Broad-XAI), and thereby move towards high-level `strong' explanations.

IVDec 31, 2020
New Bag of Deep Visual Words based features to classify chest x-ray images for COVID-19 diagnosis

Chiranjibi Sitaula, Sunil Aryal

Because the infection by Severe Acute Respiratory Syndrome Coronavirus 2 (COVID-19) causes the pneumonia-like effect in the lungs, the examination of chest x-rays can help to diagnose the diseases. For automatic analysis of images, they are represented in machines by a set of semantic features. Deep Learning (DL) models are widely used to extract features from images. General deep features may not be appropriate to represent chest x-rays as they have a few semantic regions. Though the Bag of Visual Words (BoVW) based features are shown to be more appropriate for x-ray type of images, existing BoVW features may not capture enough information to differentiate COVID-19 infection from other pneumonia-related infections. In this paper, we propose a new BoVW method over deep features, called Bag of Deep Visual Words (BoDVW), by removing the feature map normalization step and adding deep features normalization step on the raw feature maps. This helps to preserve the semantics of each feature map that may have important clues to differentiate COVID-19 from pneumonia. We evaluate the effectiveness of our proposed BoDVW features in chest x-rays classification using Support Vector Machine (SVM) to diagnose COVID-19. Our results on a publicly available COVID-19 x-ray dataset reveal that our features produce stable and prominent classification accuracy, particularly differentiating COVID-19 infection from other pneumonia, in shorter computation time compared to the state-of-the-art methods. Thus, our method could be a very useful tool for quick diagnosis of COVID-19 patients on a large scale.

CVJun 5, 2020
Content and Context Features for Scene Image Representation

Chiranjibi Sitaula, Sunil Aryal, Yong Xiang et al.

Existing research in scene image classification has focused on either content features (e.g., visual information) or context features (e.g., annotations). As they capture different information about images which can be complementary and useful to discriminate images of different classes, we suppose the fusion of them will improve classification results. In this paper, we propose new techniques to compute content features and context features, and then fuse them together. For content features, we design multi-scale deep features based on background and foreground information in images. For context features, we use annotations of similar images available in the web to design a filter words (codebook). Our experiments in three widely used benchmark scene datasets using support vector machine classifier reveal that our proposed context and content features produce better results than existing context and content features, respectively. The fusion of the proposed two types of features significantly outperform numerous state-of-the-art features.

CVJun 5, 2020
Scene Image Representation by Foreground, Background and Hybrid Features

Chiranjibi Sitaula, Yong Xiang, Sunil Aryal et al.

Previous methods for representing scene images based on deep learning primarily consider either the foreground or background information as the discriminating clues for the classification task. However, scene images also require additional information (hybrid) to cope with the inter-class similarity and intra-class variation problems. In this paper, we propose to use hybrid features in addition to foreground and background features to represent scene images. We suppose that these three types of information could jointly help to represent scene image more accurately. To this end, we adopt three VGG-16 architectures pre-trained on ImageNet, Places, and Hybrid (both ImageNet and Places) datasets for the corresponding extraction of foreground, background and hybrid information. All these three types of deep features are further aggregated to achieve our final features for the representation of scene images. Extensive experiments on two large benchmark scene datasets (MIT-67 and SUN-397) show that our method produces the state-of-the-art classification performance.

LGMay 6, 2020
A Comprehensive Survey on Outlying Aspect Mining Methods

Durgesh Samariya, Jiangang Ma, Sunil Aryal

In recent years, researchers have become increasingly interested in outlying aspect mining. Outlying aspect mining is the task of finding a set of feature(s), where a given data object is different from the rest of the data objects. Remarkably few studies have been designed to address the problem of outlying aspect mining; therefore, little is known about outlying aspect mining approaches and their strengths and weaknesses among researchers. In this work, we have grouped existing outlying aspect mining approaches in three different categories. For each category, we have provided existing work that falls in that category and then provided their strengths and weaknesses in those categories. We also offer time complexity comparison of the current techniques since it is a crucial issue in the real-world scenario. The motive behind this paper is to give a better understanding of the existing outlying aspect mining techniques and how these techniques have been developed.

LGApr 28, 2020
A new effective and efficient measure for outlying aspect mining

Durgesh Samariya, Sunil Aryal, Kai Ming Ting

Outlying Aspect Mining (OAM) aims to find the subspaces (a.k.a. aspects) in which a given query is an outlier with respect to a given dataset. Existing OAM algorithms use traditional distance/density-based outlier scores to rank subspaces. Because these distance/density-based scores depend on the dimensionality of subspaces, they cannot be compared directly between subspaces of different dimensionality. $Z$-score normalisation has been used to make them comparable. It requires to compute outlier scores of all instances in each subspace. This adds significant computational overhead on top of already expensive density estimation---making OAM algorithms infeasible to run in large and/or high-dimensional datasets. We also discover that $Z$-score normalisation is inappropriate for OAM in some cases. In this paper, we introduce a new score called SiNNE, which is independent of the dimensionality of subspaces. This enables the scores in subspaces with different dimensionalities to be compared directly without any additional normalisation. Our experimental results revealed that SiNNE produces better or at least the same results as existing scores; and it significantly improves the runtime of an existing OAM algorithm based on beam search.

CVMar 22, 2020
HDF: Hybrid Deep Features for Scene Image Representation

Chiranjibi Sitaula, Yong Xiang, Anish Basnet et al.

Nowadays it is prevalent to take features extracted from pre-trained deep learning models as image representations which have achieved promising classification performance. Existing methods usually consider either object-based features or scene-based features only. However, both types of features are important for complex images like scene images, as they can complement each other. In this paper, we propose a novel type of features -- hybrid deep features, for scene images. Specifically, we exploit both object-based and scene-based features at two levels: part image level (i.e., parts of an image) and whole image level (i.e., a whole image), which produces a total number of four types of deep features. Regarding the part image level, we also propose two new slicing techniques to extract part based features. Finally, we aggregate these four types of deep features via the concatenation operator. We demonstrate the effectiveness of our hybrid deep features on three commonly used scene datasets (MIT-67, Scene-15, and Event-8), in terms of the scene image classification task. Extensive comparisons show that our introduced features can produce state-of-the-art classification accuracies which are more consistent and stable than the results of existing features across all datasets.

LGSep 27, 2019
Improved histogram-based anomaly detector with the extended principal component features

Sunil Aryal, Arbind Agrahari Baniya, KC Santosh

In this era of big data, databases are growing rapidly in terms of the number of records. Fast automatic detection of anomalous records in these massive databases is a challenging task. Traditional distance based anomaly detectors are not applicable in these massive datasets. Recently, a simple but extremely fast anomaly detector using one-dimensional histograms has been introduced. The anomaly score of a data instance is computed as the product of the probability mass of histograms in each dimensions where it falls into. It is shown to produce competitive results compared to many state-of-the-art methods in many datasets. Because it assumes data features are independent of each other, it results in poor detection accuracy when there is correlation between features. To address this issue, we propose to increase the feature size by adding more features based on principal components. Our results show that using the original input features together with principal components improves the detection accuracy of histogram-based anomaly detector significantly without compromising much in terms of run-time.

CVSep 24, 2019
Unsupervised Deep Features for Privacy Image Classification

Chiranjibi Sitaula, Yong Xiang, Sunil Aryal et al.

Sharing images online poses security threats to a wide range of users due to the unawareness of privacy information. Deep features have been demonstrated to be a powerful representation for images. However, deep features usually suffer from the issues of a large size and requiring a huge amount of data for fine-tuning. In contrast to normal images (e.g., scene images), privacy images are often limited because of sensitive information. In this paper, we propose a novel approach that can work on limited data and generate deep features of smaller size. For training images, we first extract the initial deep features from the pre-trained model and then employ the K-means clustering algorithm to learn the centroids of these initial deep features. We use the learned centroids from training features to extract the final features for each testing image and encode our final features with the triangle encoding. To improve the discriminability of the features, we further perform the fusion of two proposed unsupervised deep features obtained from different layers. Experimental results show that the proposed features outperform state-of-the-art deep features, in terms of both classification accuracy and testing time.

CVSep 22, 2019
Tag-based Semantic Features for Scene Image Classification

Chiranjibi Sitaula, Yong Xiang, Anish Basnet et al.

The existing image feature extraction methods are primarily based on the content and structure information of images, and rarely consider the contextual semantic information. Regarding some types of images such as scenes and objects, the annotations and descriptions of them available on the web may provide reliable contextual semantic information for feature extraction. In this paper, we introduce novel semantic features of an image based on the annotations and descriptions of its similar images available on the web. Specifically, we propose a new method which consists of two consecutive steps to extract our semantic features. For each image in the training set, we initially search the top $k$ most similar images from the internet and extract their annotations/descriptions (e.g., tags or keywords). The annotation information is employed to design a filter bank for each image category and generate filter words (codebook). Finally, each image is represented by the histogram of the occurrences of filter words in all categories. We evaluate the performance of the proposed features in scene image classification on three commonly-used scene image datasets (i.e., MIT-67, Scene15 and Event8). Our method typically produces a lower feature dimension than existing feature extraction methods. Experimental results show that the proposed features generate better classification accuracies than vision based and tag based features, and comparable results to deep learning based features.

CVJun 12, 2019
Indoor image representation by high-level semantic features

Chiranjibi Sitaula, Yong Xiang, Yushu Zhang et al.

Indoor image features extraction is a fundamental problem in multiple fields such as image processing, pattern recognition, robotics and so on. Nevertheless, most of the existing feature extraction methods, which extract features based on pixels, color, shape/object parts or objects on images, suffer from limited capabilities in describing semantic information (e.g., object association). These techniques, therefore, involve undesired classification performance. To tackle this issue, we propose the notion of high-level semantic features and design four steps to extract them. Specifically, we first construct the objects pattern dictionary through extracting raw objects in the images, and then retrieve and extract semantic objects from the objects pattern dictionary. We finally extract our high-level semantic features based on the calculated probability and delta parameter. Experiments on three publicly available datasets (MIT-67, Scene15 and NYU V1) show that our feature extraction approach outperforms state-of-the-art feature extraction methods for indoor image classification, given a lower dimension of our features than those methods.