LGNov 13, 2023
Explaining black boxes with a SMILE: Statistical Model-agnostic Interpretability with Local ExplanationsKoorosh Aslansefat, Mojgan Hashemian, Martin Walker et al.
Machine learning is currently undergoing an explosion in capability, popularity, and sophistication. However, one of the major barriers to widespread acceptance of machine learning (ML) is trustworthiness: most ML models operate as black boxes, their inner workings opaque and mysterious, and it can be difficult to trust their conclusions without understanding how those conclusions are reached. Explainability is therefore a key aspect of improving trustworthiness: the ability to better understand, interpret, and anticipate the behaviour of ML models. To this end, we propose SMILE, a new method that builds on previous approaches by making use of statistical distance measures to improve explainability while remaining applicable to a wide range of input data domains.
LGJul 19, 2022
A Deep Learning Framework for Wind Turbine Repair Action Prediction Using Alarm Sequences and Long Short Term Memory AlgorithmsConnor Walker, Callum Rothon, Koorosh Aslansefat et al.
With an increasing emphasis on driving down the costs of Operations and Maintenance (O&M) in the Offshore Wind (OSW) sector, comes the requirement to explore new methodology and applications of Deep Learning (DL) to the domain. Condition-based monitoring (CBM) has been at the forefront of recent research developing alarm-based systems and data-driven decision making. This paper provides a brief insight into the research being conducted in this area, with a specific focus on alarm sequence modelling and the associated challenges faced in its implementation. The paper proposes a novel idea to predict a set of relevant repair actions from an input sequence of alarm sequences, comparing Long Short-term Memory (LSTM) and Bidirectional LSTM (biLSTM) models. Achieving training accuracy results of up to 80.23%, and test accuracy results of up to 76.01% with biLSTM gives a strong indication to the potential benefits of the proposed approach that can be furthered in future research. The paper introduces a framework that integrates the proposed approach into O$\&$M procedures and discusses the potential benefits which include the reduction of a confusing plethora of alarms, as well as unnecessary vessel transfers to the turbines for fault diagnosis and correction.
AINov 23, 2022
Online Dynamic Reliability Evaluation of Wind Turbines based on Drone-assisted MonitoringSohag Kabir, Koorosh Aslansefat, Prosanta Gope et al.
The offshore wind energy is increasingly becoming an attractive source of energy due to having lower environmental impact. Effective operation and maintenance that ensures the maximum availability of the energy generation process using offshore facilities and minimal production cost are two key factors to improve the competitiveness of this energy source over other traditional sources of energy. Condition monitoring systems are widely used for health management of offshore wind farms to have improved operation and maintenance. Reliability of the wind farms are increasingly being evaluated to aid in the maintenance process and thereby to improve the availability of the farms. However, much of the reliability analysis is performed offline based on statistical data. In this article, we propose a drone-assisted monitoring based method for online reliability evaluation of wind turbines. A blade system of a wind turbine is used as an illustrative example to demonstrate the proposed approach.
SPMar 19
A Multi-Modal Dataset for Ground Reaction Force Estimation Using Consumer Wearable SensorsParvin Ghaffarzadeh, Debarati Chakraborty, Koorosh Aslansefat et al.
This Data Descriptor presents a fully open, multi-modal dataset for estimating vertical ground reaction force (vGRF) from consumer-grade Apple Watch sensors with laboratory force plate ground truth. Ten healthy adults aged 26--41 years performed five activities: walking, jogging, running, heel drops, and step drops, while wearing two Apple Watches positioned at the left wrist and waist. The dataset contains 492 validated trials with time-aligned inertial measurement unit (IMU) recordings (approximately 100 Hz) and force plate vGRF (Force\_Z, 1000 Hz). The release includes raw and processed time series, trial-level metadata, quality-control flags, and machine-readable data dictionaries. Trial-level matching manifests link recordings across modalities using stable identifiers. Of the 492 validated trials, 395 are triad-complete, containing wrist, waist, and force plate data, enabling cross-sensor analyses and reproducible model evaluation. Dataset quality is characterised through a three-phase cross-sensor plausibility and consistency framework, repeatability analysis of peak vGRF (intraclass correlation coefficient 0.871--0.990), and systematic checks of force ranges and trial completeness. Monte Carlo sensitivity analysis showed that correlation-based validation metrics were robust to single-sample timing perturbations at the IMU sampling resolution. All data are released under CC BY 4.0, with analysis scripts archived alongside the dataset and mirrored on GitHub. This resource supports reproducible research in wearable biomechanics, benchmarking of machine learning models for vGRF estimation, and investigation of sensor placement effects using widely available consumer wearables.
CVDec 9, 2025
Mitigating Individual Skin Tone Bias in Skin Lesion Classification through Distribution-Aware ReweightingKuniko Paxton, Zeinab Dehghani, Koorosh Aslansefat et al.
Skin color has historically been a focal point of discrimination, yet fairness research in machine learning for medical imaging often relies on coarse subgroup categories, overlooking individual-level variations. Such group-based approaches risk obscuring biases faced by outliers within subgroups. This study introduces a distribution-based framework for evaluating and mitigating individual fairness in skin lesion classification. We treat skin tone as a continuous attribute rather than a categorical label, and employ kernel density estimation (KDE) to model its distribution. We further compare twelve statistical distance metrics to quantify disparities between skin tone distributions and propose a distance-based reweighting (DRW) loss function to correct underrepresentation in minority tones. Experiments across CNN and Transformer models demonstrate: (i) the limitations of categorical reweighting in capturing individual-level disparities, and (ii) the superior performance of distribution-based reweighting, particularly with Fidelity Similarity (FS), Wasserstein Distance (WD), Hellinger Metric (HM), and Harmonic Mean Similarity (HS). These findings establish a robust methodology for advancing fairness at individual level in dermatological AI systems, and highlight broader implications for sensitive continuous attributes in medical image analysis.
CVDec 9, 2025
Skewness-Guided Pruning of Multimodal Swin Transformers for Federated Skin Lesion Classification on Edge DevicesKuniko Paxton, Koorosh Aslansefat, Dhavalkumar Thakker et al.
In recent years, high-performance computer vision models have achieved remarkable success in medical imaging, with some skin lesion classification systems even surpassing dermatology specialists in diagnostic accuracy. However, such models are computationally intensive and large in size, making them unsuitable for deployment on edge devices. In addition, strict privacy constraints hinder centralized data management, motivating the adoption of Federated Learning (FL). To address these challenges, this study proposes a skewness-guided pruning method that selectively prunes the Multi-Head Self-Attention and Multi-Layer Perceptron layers of a multimodal Swin Transformer based on the statistical skewness of their output distributions. The proposed method was validated in a horizontal FL environment and shown to maintain performance while substantially reducing model complexity. Experiments on the compact Swin Transformer demonstrate approximately 36\% model size reduction with no loss in accuracy. These findings highlight the feasibility of achieving efficient model compression and privacy-preserving distributed learning for multimodal medical AI on edge devices.
LGMay 27, 2020Code
SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical Difference MeasureKoorosh Aslansefat, Ioannis Sorokos, Declan Whiting et al.
Ensuring safety and explainability of machine learning (ML) is a topic of increasing relevance as data-driven applications venture into safety-critical application domains, traditionally committed to high safety standards that are not satisfied with an exclusive testing approach of otherwise inaccessible black-box systems. Especially the interaction between safety and security is a central challenge, as security violations can lead to compromised safety. The contribution of this paper to addressing both safety and security within a single concept of protection applicable during the operation of ML systems is active monitoring of the behaviour and the operational context of the data-driven system based on distance measures of the Empirical Cumulative Distribution Function (ECDF). We investigate abstract datasets (XOR, Spiral, Circle) and current security-specific datasets for intrusion detection (CICIDS2017) of simulated network traffic, using distributional shift detection measures including the Kolmogorov-Smirnov, Kuiper, Anderson-Darling, Wasserstein and mixed Wasserstein-Anderson-Darling measures. Our preliminary findings indicate that the approach can provide a basis for detecting whether the application context of an ML component is valid in the safety-security. Our preliminary code and results are available at https://github.com/ISorokos/SafeML.
CVJun 13, 2025
Evaluating Fairness and Mitigating Bias in Machine Learning: A Novel Technique using Tensor Data and Bayesian RegressionKuniko Paxton, Koorosh Aslansefat, Dhavalkumar Thakker et al.
Fairness is a critical component of Trustworthy AI. In this paper, we focus on Machine Learning (ML) and the performance of model predictions when dealing with skin color. Unlike other sensitive attributes, the nature of skin color differs significantly. In computer vision, skin color is represented as tensor data rather than categorical values or single numerical points. However, much of the research on fairness across sensitive groups has focused on categorical features such as gender and race. This paper introduces a new technique for evaluating fairness in ML for image classification tasks, specifically without the use of annotation. To address the limitations of prior work, we handle tensor data, like skin color, without classifying it rigidly. Instead, we convert it into probability distributions and apply statistical distance measures. This novel approach allows us to capture fine-grained nuances in fairness both within and across what would traditionally be considered distinct groups. Additionally, we propose an innovative training method to mitigate the latent biases present in conventional skin tone categorization. This method leverages color distance estimates calculated through Bayesian regression with polynomial functions, ensuring a more nuanced and equitable treatment of skin color in ML models.
CLMay 27, 2025
Explaining Large Language Models with gSMILEZeinab Dehghani, Mohammed Naveed Akram, Koorosh Aslansefat et al.
Large Language Models (LLMs) such as GPT, LLaMA, and Claude achieve remarkable performance in text generation but remain opaque in their decision-making processes, limiting trust and accountability in high-stakes applications. We present gSMILE (generative SMILE), a model-agnostic, perturbation-based framework for token-level interpretability in LLMs. Extending the SMILE methodology, gSMILE uses controlled prompt perturbations, Wasserstein distance metrics, and weighted linear surrogates to identify input tokens with the most significant impact on the output. This process enables the generation of intuitive heatmaps that visually highlight influential tokens and reasoning paths. We evaluate gSMILE across leading LLMs (OpenAI's gpt-3.5-turbo-instruct, Meta's LLaMA 3.1 Instruct Turbo, and Anthropic's Claude 2.1) using attribution fidelity, attribution consistency, attribution stability, attribution faithfulness, and attribution accuracy as metrics. Results show that gSMILE delivers reliable human-aligned attributions, with Claude 2.1 excelling in attention fidelity and GPT-3.5 achieving the highest output consistency. These findings demonstrate gSMILE's ability to balance model performance and interpretability, enabling more transparent and trustworthy AI systems.
CVMar 31
Exploring the Impact of Skin Color on Skin Lesion SegmentationKuniko Paxton, Medina Kapo, Amila AkagiÄ et al.
Skin cancer, particularly melanoma, remains a major cause of morbidity and mortality, making early detection critical. AI-driven dermatology systems often rely on skin lesion segmentation as a preprocessing step to delineate the lesion from surrounding skin and support downstream analysis. While fairness concerns regarding skin tone have been widely studied for lesion classification, the influence of skin tone on the segmentation stage remains under-quantified and is frequently assessed using coarse, discrete skin tone categories. In this work, we evaluate three strong segmentation architectures (UNet, DeepLabV3 with a ResNet50 backbone, and DINOv2) on two public dermoscopic datasets (HAM10000 and ISIC2017) and introduce a continuous pigment or contrast analysis that treats pixel-wise ITA values as distributions. Using Wasserstein distances between within-image distributions for skin-only, lesion-only, and whole-image regions, we quantify lesion skin contrast and relate it to segmentation performance across multiple metrics. Within the range represented in these datasets, global skin tone metrics (Fitzpatrick grouping or mean ITA) show weak association with segmentation quality. In contrast, low lesion-skin contrast is consistently associated with larger segmentation errors in models, indicating that boundary ambiguity and low contrast are key drivers of failure. These findings suggest that fairness improvements in dermoscopic segmentation should prioritize robust handling of low-contrast lesions, and the distribution-based pigment measures provide a more informative audit signal than discrete skin-tone categories.
LGSep 4, 2025
Q-SafeML: Safety Assessment of Quantum Machine Learning via Quantum Distance MetricsOliver Dunn, Koorosh Aslansefat, Yiannis Papadopoulos
The rise of machine learning in safety-critical systems has paralleled advancements in quantum computing, leading to the emerging field of Quantum Machine Learning (QML). While safety monitoring has progressed in classical ML, existing methods are not directly applicable to QML due to fundamental differences in quantum computation. Given the novelty of QML, dedicated safety mechanisms remain underdeveloped. This paper introduces Q-SafeML, a safety monitoring approach for QML. The method builds on SafeML, a recent method that utilizes statistical distance measures to assess model accuracy and provide confidence in the reasoning of an algorithm. An adapted version of Q-SafeML incorporates quantum-centric distance measures, aligning with the probabilistic nature of QML outputs. This shift to a model-dependent, post-classification evaluation represents a key departure from classical SafeML, which is dataset-driven and classifier-agnostic. The distinction is motivated by the unique representational constraints of quantum systems, requiring distance metrics defined over quantum state spaces. Q-SafeML detects distances between operational and training data addressing the concept drifts in the context of QML. Experiments on QCNN and VQC Models show that this enables informed human oversight, enhancing system transparency and safety.
AISep 3, 2025
RAGuard: A Novel Approach for in-context Safe Retrieval Augmented Generation for LLMsConnor Walker, Koorosh Aslansefat, Mohammad Naveed Akram et al.
Accuracy and safety are paramount in Offshore Wind (OSW) maintenance, yet conventional Large Language Models (LLMs) often fail when confronted with highly specialised or unexpected scenarios. We introduce RAGuard, an enhanced Retrieval-Augmented Generation (RAG) framework that explicitly integrates safety-critical documents alongside technical manuals.By issuing parallel queries to two indices and allocating separate retrieval budgets for knowledge and safety, RAGuard guarantees both technical depth and safety coverage. We further develop a SafetyClamp extension that fetches a larger candidate pool, "hard-clamping" exact slot guarantees to safety. We evaluate across sparse (BM25), dense (Dense Passage Retrieval) and hybrid retrieval paradigms, measuring Technical Recall@K and Safety Recall@K. Both proposed extensions of RAG show an increase in Safety Recall@K from almost 0\% in RAG to more than 50\% in RAGuard, while maintaining Technical Recall above 60\%. These results demonstrate that RAGuard and SafetyClamp have the potential to establish a new standard for integrating safety assurance into LLM-powered decision support in critical maintenance contexts.
CVAug 31, 2025
Enhancing Fairness in Skin Lesion Classification for Medical Diagnosis Using Prune LearningKuniko Paxton, Koorosh Aslansefat, Dhavalkumar Thakker et al.
Recent advances in deep learning have significantly improved the accuracy of skin lesion classification models, supporting medical diagnoses and promoting equitable healthcare. However, concerns remain about potential biases related to skin color, which can impact diagnostic outcomes. Ensuring fairness is challenging due to difficulties in classifying skin tones, high computational demands, and the complexity of objectively verifying fairness. To address these challenges, we propose a fairness algorithm for skin lesion classification that overcomes the challenges associated with achieving diagnostic fairness across varying skin tones. By calculating the skewness of the feature map in the convolution layer of the VGG (Visual Geometry Group) network and the patches and the heads of the Vision Transformer, our method reduces unnecessary channels related to skin tone, focusing instead on the lesion area. This approach lowers computational costs and mitigates bias without relying on conventional statistical methods. It potentially reduces model size while maintaining fairness, making it more practical for real-world applications.
CVAug 28, 2025
Safer Skin Lesion Classification with Global Class Activation Probability Map Evaluation and SafeMLKuniko Paxton, Koorosh Aslansefat, Amila Akagić et al.
Recent advancements in skin lesion classification models have significantly improved accuracy, with some models even surpassing dermatologists' diagnostic performance. However, in medical practice, distrust in AI models remains a challenge. Beyond high accuracy, trustworthy, explainable diagnoses are essential. Existing explainability methods have reliability issues, with LIME-based methods suffering from inconsistency, while CAM-based methods failing to consider all classes. To address these limitations, we propose Global Class Activation Probabilistic Map Evaluation, a method that analyses all classes' activation probability maps probabilistically and at a pixel level. By visualizing the diagnostic process in a unified manner, it helps reduce the risk of misdiagnosis. Furthermore, the application of SafeML enhances the detection of false diagnoses and issues warnings to doctors and patients as needed, improving diagnostic reliability and ultimately patient safety. We evaluated our method using the ISIC datasets with MobileNetV2 and Vision Transformers.
SEJun 3, 2021
DEIS: Dependability Engineering Innovation for Industrial CPSErik Armengaud, Georg Macher, Alexander Massoner et al.
The open and cooperative nature of Cyber-Physical Systems (CPS) poses new challenges in assuring dependability. The DEIS project (Dependability Engineering Innovation for automotive CPS. This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 732242, see http://www.deis-project.eu) addresses these challenges by developing technologies that form a science of dependable system integration. In the core of these technologies lies the concept of a Digital Dependability Identity (DDI) of a component or system. DDIs are modular, composable, and executable in the field facilitating (a) efficient synthesis of component and system dependability information over the supply chain and (b) effective evaluation of this information in-the-field for safe and secure composition of highly distributed and autonomous CPS. The paper outlines the DDI concept and opportunities for application in four industrial use cases.
SEMay 11, 2020
Failure Mode Reasoning in Model Based Safety AnalysisHamid Jahanian, David Parker, Marc Zeller et al.
Failure Mode Reasoning (FMR) is a novel approach for analyzing failure in a Safety Instrumented System (SIS). The method uses an automatic analysis of an SIS program to calculate potential failures in parts of the SIS. In this paper we use a case study from the power industry to demonstrate how FMR can be utilized in conjunction with other model-based safety analysis methods, such as HiP-HOPS and CFT, in order to achieve a comprehensive safety analysis of SIS. In this case study, FMR covers the analysis of SIS inputs while HiP-HOPS/CFT models the faults of logic solver and final elements. The SIS program is analyzed by FMR and the results are exported to HiP-HOPS/CFT via automated interfaces. The final outcome is the collective list of SIS failure modes along with their reliability measures. We present and review the results from both qualitative and quantitative perspectives.
NESep 26, 2018
PeSOA: Penguins Search Optimisation Algorithm for Global Optimisation ProblemsYoucef Gheraibia, Abdelouahab Moussaoui, Peng-Yeng Yin et al.
This paper develops Penguin search Optimisation Algorithm (PeSOA), a new metaheuristic algorithm which is inspired by the foraging behaviours of penguins. A population of penguins located in the solution space of the given search and optimisation problem is divided into groups and tasked with finding optimal solutions. The penguins of a group perform simultaneous dives and work as a team to collaboratively feed on fish the energy content of which corresponds to the fitness of candidate solutions. Fish stocks have higher fitness and concentration near areas of solution optima and thus drive the search. Penguins can migrate to other places if their original habitat lacks food. We identify two forms of penguin communication both intra-group and inter-group which are useful in designing intensification and diversification strategies. An efficient intensification strategy allows fast convergence to a local optimum, whereas an effective diversification strategy avoids cyclic behaviour around local optima and explores more effectively the space of potential solutions. The proposed PeSOA algorithm has been validated on a well-known set of benchmark functions. Comparative performances with six other nature-inspired metaheuristics show that the PeSOA performs favourably in these tests. A run-time analysis shows that the performance obtained by the PeSOA is very stable at any time of the evolution horizon, making the PeSOA a viable approach for real world applications.