Zahraa S. Abdallah

LG
h-index20
17papers
42citations
Novelty39%
AI Score47

17 Papers

LGJun 3
Expectations vs. Realities: The Cost of MSE-Optimal Forecasting Under Conditional Uncertainty

Riku Green, Zahraa S. Abdallah, Telmo M Silva Filho

Multi-step time series forecasting (MSF) is commonly evaluated using point-wise error metrics such as mean squared error (MSE), implicitly treating the conditional mean as a sufficient target. We show that this can be misleading under conditional uncertainty, where the conditional expectation becomes unrepresentative of typical realized values at longer horizons. We formalize this effect through a conditional uncertainty gap and prove that whenever this gap is nonzero, no deterministic predictor can simultaneously minimize MSE and match the marginal distribution of realized futures. This establishes a fundamental, model-agnostic trade-off between point accuracy and marginal realism in MSF evaluation. Using controlled stochastic dynamical systems and nine real-world forecasting benchmarks, we empirically characterize the resulting accuracy--realism frontier and \textbf{quantify the practical cost of MSE-only model selection}. As conditional uncertainty increases with forecast horizon, the attainable set expands into a pronounced Pareto front, separating MSE-optimal but under-dispersed predictors from methods that trade accuracy for realistic marginal variability. \textbf{Across benchmarks, we find that small relaxations in MSE ($\boldsymbol{\le 5\%}$) frequently unlock disproportionate gains in marginal realism, with median improvements of $\mathbf{17.3\%}$ and gains exceeding $\mathbf{30\%}$ in some datasets.} We further show that common forecasting strategies systematically occupy different regions of this frontier: direct multi-output predictors concentrate near the accuracy-optimal extreme, while recursive strategies and sample-based inference favors marginal realism. Together, these results expose a structural failure mode of MSE-based evaluation in long-horizon forecasting and recast strategy and inference selection as navigation of an unavoidable accuracy--realism trade-off.

NCJan 20, 2023
Interpretable Classification of Early Stage Parkinson's Disease from EEG

Amarpal Sahota, Amber Roguski, Matthew W. Jones et al.

Detecting Parkinson's Disease in its early stages using EEG data presents a significant challenge. This paper introduces a novel approach, representing EEG data as a 15-variate series of bandpower and peak frequency values/coefficients. The hypothesis is that this representation captures essential information from the noisy EEG signal, improving disease detection. Statistical features extracted from this representation are utilised as input for interpretable machine learning models, specifically Decision Tree and AdaBoost classifiers. Our classification pipeline is deployed within our proposed framework which enables high-importance data types and brain regions for classification to be identified. Interestingly, our analysis reveals that while there is no significant regional importance, the N1 sleep data type exhibits statistically significant predictive power (p < 0.01) for early-stage Parkinson's Disease classification. AdaBoost classifiers trained on the N1 data type consistently outperform baseline models, achieving over 80% accuracy and recall. Our classification pipeline statistically significantly outperforms baseline models indicating that the model has acquired useful information. Paired with the interpretability (ability to view feature importance's) of our pipeline this enables us to generate meaningful insights into the classification of early stage Parkinson's with our N1 models. In Future, these models could be deployed in the real world - the results presented in this paper indicate that more than 3 in 4 early-stage Parkinson's cases would be captured with our pipeline.

NCAug 1, 2024
Investigating Brain Connectivity and Regional Statistics from EEG for early stage Parkinson's Classification

Amarpal Sahota, Amber Roguski, Matthew W Jones et al.

We evaluate the effectiveness of combining brain connectivity metrics with signal statistics for early stage Parkinson's Disease (PD) classification using electroencephalogram data (EEG). The data is from 5 arousal states - wakeful and four sleep stages (N1, N2, N3 and REM). Our pipeline uses an Ada Boost model for classification on a challenging early stage PD classification task with with only 30 participants (11 PD , 19 Healthy Control). Evaluating 9 brain connectivity metrics we find the best connectivity metric to be different for each arousal state with Phase Lag Index achieving the highest individual classification accuracy of 86\% on N1 data. Further to this our pipeline using regional signal statistics achieves an accuracy of 78\%, using brain connectivity only achieves an accuracy of 86\% whereas combining the two achieves a best accuracy of 91\%. This best performance is achieved on N1 data using Phase Lag Index (PLI) combined with statistics derived from the frequency characteristics of the EEG signal. This model also achieves a recall of 80 \% and precision of 96\%. Furthermore we find that on data from each arousal state, combining PLI with regional signal statistics improves classification accuracy versus using signal statistics or brain connectivity alone. Thus we conclude that combining brain connectivity statistics with regional EEG statistics is optimal for classifier performance on early stage Parkinson's. Additionally, we find outperformance of N1 EEG for classification of Parkinson's and expect this could be due to disrupted N1 sleep in PD. This should be explored in future work.

LGNov 14, 2022
Temporal patterns in insulin needs for Type 1 diabetes

Isabella Degen, Zahraa S. Abdallah

Type 1 Diabetes (T1D) is a chronic condition where the body produces little or no insulin, a hormone required for the cells to use blood glucose (BG) for energy and to regulate BG levels in the body. Finding the right insulin dose and time remains a complex, challenging and as yet unsolved control task. In this study, we use the OpenAPS Data Commons dataset, which is an extensive dataset collected in real-life conditions, to discover temporal patterns in insulin need driven by well-known factors such as carbohydrates as well as potentially novel factors. We utilised various time series techniques to spot such patterns using matrix profile and multi-variate clustering. The better we understand T1D and the factors impacting insulin needs, the more we can contribute to building data-driven technology for T1D treatments.

LGJul 12, 2022
IMG-NILM: A Deep learning NILM approach using energy heatmaps

Jonah Edmonds, Zahraa S. Abdallah

Energy disaggregation estimates appliance-by-appliance electricity consumption from a single meter that measures the whole home's electricity demand. Compared with intrusive load monitoring, NILM (Non-intrusive load monitoring) is low cost, easy to deploy, and flexible. In this paper, we propose a new method, coined IMG-NILM, that utilises convolutional neural networks (CNN) to disaggregate electricity data represented as images. Instead of the traditional approach of dealing with electricity data as time series, IMG-NILM transforms time series into heatmaps with higher electricity readings portrayed as 'hotter' colours. The image representation is then used in CNN to detect the signature of an appliance from aggregated data. IMG-NILM is robust and flexible with consistent performance on various types of appliances; including single and multiple states. It attains a test accuracy of up to 93% on the UK-Dale dataset within a single house, where a substantial number of appliances are present. In more challenging settings where electricity data is collected from different houses, IMG-NILM attains also a very good average accuracy of 85%.

CVApr 13, 2022
Investigating Temporal Convolutional Neural Networks for Satellite Image Time Series Classification: A survey

James Brock, Zahraa S. Abdallah

Satellite Image Time Series (SITS) of the Earth's surface provide detailed land cover maps, with their quality in the spatial and temporal dimensions consistently improving. These image time series are integral for developing systems that aim to produce accurate, up-to-date land cover maps of the Earth's surface. Applications are wide-ranging, with notable examples including ecosystem mapping, vegetation process monitoring and anthropogenic land-use change tracking. Recently proposed methods for SITS classification have demonstrated respectable merit, but these methods tend to lack native mechanisms that exploit the temporal dimension of the data; commonly resulting in extensive data pre-processing contributing to prohibitively long training times. To overcome these shortcomings, Temporal CNNs have recently been employed for SITS classification tasks with encouraging results. This paper seeks to survey this method against a plethora of other contemporary methods for SITS classification to validate the existing findings in recent literature. Comprehensive experiments are carried out on two benchmark SITS datasets with the results demonstrating that Temporal CNNs display a superior performance to the comparative benchmark algorithms across both studied datasets, achieving accuracies of 95.0\% and 87.3\% respectively. Investigations into the Temporal CNN architecture also highlighted the non-trivial task of optimising the model for a new dataset.

LGJul 25, 2023
RED CoMETS: An ensemble classifier for symbolically represented multivariate time series

Luca A. Bennett, Zahraa S. Abdallah

Multivariate time series classification is a rapidly growing research field with practical applications in finance, healthcare, engineering, and more. The complexity of classifying multivariate time series data arises from its high dimensionality, temporal dependencies, and varying lengths. This paper introduces a novel ensemble classifier called RED CoMETS (Random Enhanced Co-eye for Multivariate Time Series), which addresses these challenges. RED CoMETS builds upon the success of Co-eye, an ensemble classifier specifically designed for symbolically represented univariate time series, and extends its capabilities to handle multivariate data. The performance of RED CoMETS is evaluated on benchmark datasets from the UCR archive, where it demonstrates competitive accuracy when compared to state-of-the-art techniques in multivariate settings. Notably, it achieves the highest reported accuracy in the literature for the 'HandMovementDirection' dataset. Moreover, the proposed method significantly reduces computation time compared to Co-eye, making it an efficient and effective choice for multivariate time series classification.

CVJan 31, 2023
Transfer Learning and Class Decomposition for Detecting the Cognitive Decline of Alzheimer Disease

Maha M. Alwuthaynani, Zahraa S. Abdallah, Raul Santos-Rodriguez

Early diagnosis of Alzheimer's disease (AD) is essential in preventing the disease's progression. Therefore, detecting AD from neuroimaging data such as structural magnetic resonance imaging (sMRI) has been a topic of intense investigation in recent years. Deep learning has gained considerable attention in Alzheimer's detection. However, training a convolutional neural network from scratch is challenging since it demands more computational time and a significant amount of annotated data. By transferring knowledge learned from other image recognition tasks to medical image classification, transfer learning can provide a promising and effective solution. Irregularities in the dataset distribution present another difficulty. Class decomposition can tackle this issue by simplifying learning a dataset's class boundaries. Motivated by these approaches, this paper proposes a transfer learning method using class decomposition to detect Alzheimer's disease from sMRI images. We use two ImageNet-trained architectures: VGG19 and ResNet50, and an entropy-based technique to determine the most informative images. The proposed model achieved state-of-the-art performance in the Alzheimer's disease (AD) vs mild cognitive impairment (MCI) vs cognitively normal (CN) classification task with a 3\% increase in accuracy from what is reported in the literature.

CLOct 3, 2025Code
Semantic Similarity in Radiology Reports via LLMs and NER

Beth Pearson, Ahmed Adnan, Zahraa S. Abdallah

Radiology report evaluation is a crucial part of radiologists' training and plays a key role in ensuring diagnostic accuracy. As part of the standard reporting workflow, a junior radiologist typically prepares a preliminary report, which is then reviewed and edited by a senior radiologist to produce the final report. Identifying semantic differences between preliminary and final reports is essential for junior doctors, both as a training tool and to help uncover gaps in clinical knowledge. While AI in radiology is a rapidly growing field, the application of large language models (LLMs) remains challenging due to the need for specialised domain knowledge. In this paper, we explore the ability of LLMs to provide explainable and accurate comparisons of reports in the radiology domain. We begin by comparing the performance of several LLMs in comparing radiology reports. We then assess a more traditional approach based on Named-Entity-Recognition (NER). However, both approaches exhibit limitations in delivering accurate feedback on semantic similarity. To address this, we propose Llama-EntScore, a semantic similarity scoring method using a combination of Llama 3.1 and NER with tunable weights to emphasise or de-emphasise specific types of differences. Our approach generates a quantitative similarity score for tracking progress and also gives an interpretation of the score that aims to offer valuable guidance in reviewing and refining their reporting. We find our method achieves 67% exact-match accuracy and 93% accuracy within +/- 1 when compared to radiologist-provided ground truth scores - outperforming both LLMs and NER used independently. Code is available at: https://github.com/otmive/llama_reports

LGNov 14, 2025
Epistemic Error Decomposition for Multi-step Time Series Forecasting: Rethinking Bias-Variance in Recursive and Direct Strategies

Riku Green, Huw Day, Zahraa S. Abdallah et al.

Multi-step forecasting is often described through a simple rule of thumb: recursive strategies are said to have high bias and low variance, while direct strategies are said to have low bias and high variance. We revisit this belief by decomposing the expected multi-step forecast error into three parts: irreducible noise, a structural approximation gap, and an estimation-variance term. For linear predictors we show that the structural gap is identically zero for any dataset. For nonlinear predictors, however, the repeated composition used in recursion can increase model expressivity, making the structural gap depend on both the model and the data. We further show that the estimation variance of the recursive strategy at any horizon can be written as the one-step variance multiplied by a Jacobian-based amplification factor that measures how sensitive the composed predictor is to parameter error. This perspective explains when recursive forecasting may simultaneously have lower bias and higher variance than direct forecasting. Experiments with multilayer perceptrons on the ETTm1 dataset confirm these findings. The results offer practical guidance for choosing between recursive and direct strategies based on model nonlinearity and noise characteristics, rather than relying on traditional bias-variance intuition.

AIApr 4, 2025
Towards deployment-centric multimodal AI beyond vision and language

Xianyuan Liu, Jiayang Zhang, Shuo Zhou et al.

Multimodal artificial intelligence (AI) integrates diverse types of data via machine learning to improve understanding, prediction, and decision-making across disciplines such as healthcare, science, and engineering. However, most multimodal AI advances focus on models for vision and language data, while their deployability remains a key challenge. We advocate a deployment-centric workflow that incorporates deployment constraints early to reduce the likelihood of undeployable solutions, complementing data-centric and model-centric approaches. We also emphasise deeper integration across multiple levels of multimodality and multidisciplinary collaboration to significantly broaden the research scope beyond vision and language. To facilitate this approach, we identify common multimodal-AI-specific challenges shared across disciplines and examine three real-world use cases: pandemic response, self-driving car design, and climate change adaptation, drawing expertise from healthcare, social science, engineering, science, sustainability, and finance. By fostering multidisciplinary dialogue and open research practices, our community can accelerate deployment-centric development for broad societal impact.

LGDec 17, 2025
A Regime-Aware Fusion Framework for Time Series Classification

Honey Singh Chauhan, Zahraa S. Abdallah

Kernel-based methods such as Rocket are among the most effective default approaches for univariate time series classification (TSC), yet they do not perform equally well across all datasets. We revisit the long-standing intuition that different representations capture complementary structure and show that selectively fusing them can yield consistent improvements over Rocket on specific, systematically identifiable kinds of datasets. We introduce Fusion-3 (F3), a lightweight framework that adaptively fuses Rocket, SAX, and SFA representations. To understand when fusion helps, we cluster UCR datasets into six groups using meta-features capturing series length, spectral structure, roughness, and class imbalance, and treat these clusters as interpretable data-structure regimes. Our analysis shows that fusion typically outperforms strong baselines in regimes with structured variability or rich frequency content, while offering diminishing returns in highly irregular or outlier-heavy settings. To support these findings, we combine three complementary analyses: non-parametric paired statistics across datasets, ablation studies isolating the roles of individual representations, and attribution via SHAP to identify which dataset properties predict fusion gains. Sample-level case studies further reveal the underlying mechanism: fusion primarily improves performance by rescuing specific errors, with adaptive increases in frequency-domain weighting precisely where corrections occur. Using 5-fold cross-validation on the 113 UCR datasets, F3 yields small but consistent average improvements over Rocket, supported by frequentist and Bayesian evidence and accompanied by clearly identifiable failure cases. Our results show that selectively applied fusion provides dependable and interpretable extension to strong kernel-based methods, correcting their weaknesses precisely where the data support it.

HCMay 7, 2025
BrisT1D Dataset: Young Adults with Type 1 Diabetes in the UK using Smartwatches

Sam Gordon James, Miranda Elaine Glynis Armstrong, Aisling Ann O'Kane et al.

Background: Type 1 diabetes (T1D) has seen a rapid evolution in management technology and forms a useful case study for the future management of other chronic conditions. Further development of this management technology requires an exploration of its real-world use and the potential of additional data streams. To facilitate this, we contribute the BrisT1D Dataset to the growing number of public T1D management datasets. The dataset was developed from a longitudinal study of 24 young adults in the UK who used a smartwatch alongside their usual T1D management. Findings: The BrisT1D dataset features both device data from the T1D management systems and smartwatches used by participants, as well as transcripts of monthly interviews and focus groups conducted during the study. The device data is provided in a processed state, for usability and more rapid analysis, and in a raw state, for in-depth exploration of novel insights captured in the study. Conclusions: This dataset has a range of potential applications. The quantitative elements can support blood glucose prediction, hypoglycaemia prediction, and closed-loop algorithm development. The qualitative elements enable the exploration of user experiences and opinions, as well as broader mixed-methods research into the role of smartwatches in T1D management.

LGMay 1, 2025
KnowEEG: Explainable Knowledge Driven EEG Classification

Amarpal Sahota, Navid Mohammadi Foumani, Raul Santos-Rodriguez et al.

Electroencephalography (EEG) is a method of recording brain activity that shows significant promise in applications ranging from disease classification to emotion detection and brain-computer interfaces. Recent advances in deep learning have improved EEG classification performance yet model explainability remains an issue. To address this key limitation of explainability we introduce KnowEEG; a novel explainable machine learning approach for EEG classification. KnowEEG extracts a comprehensive set of per-electrode features, filters them using statistical tests, and integrates between-electrode connectivity statistics. These features are then input to our modified Random Forest model (Fusion Forest) that balances per electrode statistics with between electrode connectivity features in growing the trees of the forest. By incorporating knowledge from both the generalized time-series and EEG-specific domains, KnowEEG achieves performance comparable to or exceeding state-of-the-art deep learning models across five different classification tasks: emotion detection, mental workload classification, eyes open/closed detection, abnormal EEG classification, and event detection. In addition to high performance, KnowEEG provides inherent explainability through feature importance scores for understandable features. We demonstrate by example on the eyes closed/open classification task that this explainability can be used to discover knowledge about the classes. This discovered knowledge for eyes open/closed classification was proven to be correct by current neuroscience literature. Therefore, the impact of KnowEEG will be significant for domains where EEG explainability is critical such as healthcare.

CLJan 25, 2021
Transfer Learning Approach for Detecting Psychological Distress in Brexit Tweets

Sean-Kelly Palicki, Shereen Fouad, Mariam Adedoyin-Olowe et al.

In 2016, United Kingdom (UK) citizens voted to leave the European Union (EU), which was officially implemented in 2020. During this period, UK residents experienced a great deal of uncertainty around the UK's continued relationship with the EU. Many people have used social media platforms to express their emotions about this critical event. Sentiment analysis has been recently considered as an important tool for detecting mental well-being in Twitter contents. However, detecting the psychological distress status in political-related tweets is a challenging task due to the lack of explicit sentences describing the depressive or anxiety status. To address this problem, this paper leverages a transfer learning approach for sentiment analysis to measure the non-clinical psychological distress status in Brexit tweets. The framework transfers the knowledge learnt from self-reported psychological distress tweets (source domain) to detect the distress status in Brexit tweets (target domain). The framework applies a domain adaptation technique to decrease the impact of negative transfer between source and target domains. The paper also introduces a Brexit distress index that can be used to detect levels of psychological distress of individuals in Brexit tweets. We design an experiment that includes data from both domains. The proposed model is able to detect the non-clinical psychological distress status in Brexit tweets with an accuracy of 66% and 62% on the source and target domains, respectively.

LGApr 14, 2020
Co-eye: A Multi-resolution Symbolic Representation to TimeSeries Diversified Ensemble Classification

Zahraa S. Abdallah, Mohamed Medhat Gaber

Time series classification (TSC) is a challenging task that attracted many researchers in the last few years. One main challenge in TSC is the diversity of domains where time series data come from. Thus, there is no "one model that fits all" in TSC. Some algorithms are very accurate in classifying a specific type of time series when the whole series is considered, while some only target the existence/non-existence of specific patterns/shapelets. Yet other techniques focus on the frequency of occurrences of discriminating patterns/features. This paper presents a new classification technique that addresses the inherent diversity problem in TSC using a nature-inspired method. The technique is stimulated by how flies look at the world through "compound eyes" that are made up of thousands of lenses, called ommatidia. Each ommatidium is an eye with its own lens, and thousands of them together create a broad field of vision. The developed technique similarly uses different lenses and representations to look at the time series, and then combines them for broader visibility. These lenses have been created through hyper-parameterisation of symbolic representations (Piecewise Aggregate and Fourier approximations). The algorithm builds a random forest for each lens, then performs soft dynamic voting for classifying new instances using the most confident eyes, i.e, forests. We evaluate the new technique, coined Co-eye, using the recently released extended version of UCR archive, containing more than 100 datasets across a wide range of domains. The results show the benefits of bringing together different perspectives reflecting on the accuracy and robustness of Co-eye in comparison to other state-of-the-art techniques.

LGApr 8, 2020
DeepStreamCE: A Streaming Approach to Concept Evolution Detection in Deep Neural Networks

Lorraine Chambers, Mohamed Medhat Gaber, Zahraa S. Abdallah

Deep neural networks have experimentally demonstrated superior performance over other machine learning approaches in decision-making predictions. However, one major concern is the closed set nature of the classification decision on the trained classes, which can have serious consequences in safety critical systems. When the deep neural network is in a streaming environment, fast interpretation of this classification is required to determine if the classification result is trusted. Un-trusted classifications can occur when the input data to the deep neural network changes over time. One type of change that can occur is concept evolution, where a new class is introduced that the deep neural network was not trained on. In the majority of deep neural network architectures, the only option is to assign this instance to one of the classes it was trained on, which would be incorrect. The aim of this research is to detect the arrival of a new class in the stream. Existing work on interpreting deep neural networks often focuses on neuron activations to provide visual interpretation and feature extraction. Our novel approach, coined DeepStreamCE, uses streaming approaches for real-time concept evolution detection in deep neural networks. DeepStreamCE applies neuron activation reduction using an autoencoder and MCOD stream-based clustering in the offline phase. Both outputs are used in the online phase to analyse the neuron activations in the evolving stream in order to detect concept evolution occurrence in real time. We evaluate DeepStreamCE by training VGG16 convolutional neural networks on combinations of data from the CIFAR-10 dataset, holding out some classes to be used as concept evolution. For comparison, we apply the data and VGG16 networks to an open-set deep network solution - OpenMax. DeepStreamCE outperforms OpenMax when identifying concept evolution for our datasets.