CVMay 29Code
SUPREME: A Multi-GPU Framework for Reproducible Image Unlearning Method EvaluationPetros Andreou, Jamie Lanyon, Axel Finke et al.
Machine unlearning removes the influence of specific training data from a trained model without retraining it from scratch. Evaluating an unlearning method requires repeating training, unlearning, and evaluation across multiple seeds, which is computationally expensive. To our knowledge, existing image classification unlearning frameworks run on a single GPU, which limits how many seeds can be evaluated in reasonable time. We introduce SUPREME, an open-source framework that distributes these stages across multiple GPUs. SUPREME makes three contributions: a registry-based design for adding new methods, metrics, models, and scenarios; a multi-GPU architecture supporting multiple accelerators and precision modes; and a demonstration on Pins Face Recognition using ResNet18 and ViT under full-class and random-sample unlearning across ten seeds. The framework is available at https://github.com/pedroandreou/supreme-unlearning.
AIMay 26
RULER: Representation-Level Verification of Machine UnlearningGeorgina Cosma, Axel Finke
Machine unlearning aims to remove the influence of specific training records from a deployed model without retraining from scratch. Current protocols verify this at the output level through membership inference, retain accuracy, and forget-set accuracy, but a model can satisfy all three whilst still encoding forgotten records in its intermediate representations. We introduce RULER, a set of representation-level verification metrics. The oracle-comparative metric M2 measures whether forget-set records occupy the same representational position as in a model retrained without them. The oracle-free metric M4 detects residuals from the unlearned model's internal similarity structure alone, without retraining. Four approximate unlearning methods all pass output-level evaluation, yet under a linear mixed-effects model M2 detects significant residuals in 10 of 12 conditions (p<0.05), with effect sizes growing as the forget fraction increases. A fifth method, Bad Teacher, shows the same residuals despite a different forgetting mechanism. M4 acts as a pre-unlearning diagnostic across tabular, image, clinical text, and face-identity settings: it detects identity-level memorisation in face recognition models where no tested method fully erases the signal.
SEOct 4, 2022Code
Multifaceted Hierarchical Report Identification for Non-Functional Bugs in Deep Learning FrameworksGuoming Long, Tao Chen, Georgina Cosma
Non-functional bugs (e.g., performance- or accuracy-related bugs) in Deep Learning (DL) frameworks can lead to some of the most devastating consequences. Reporting those bugs on a repository such as GitHub is a standard route to fix them. Yet, given the growing number of new GitHub reports for DL frameworks, it is intrinsically difficult for developers to distinguish those that reveal non-functional bugs among the others, and assign them to the right contributor for investigation in a timely manner. In this paper, we propose MHNurf - an end-to-end tool for automatically identifying non-functional bug related reports in DL frameworks. The core of MHNurf is a Multifaceted Hierarchical Attention Network (MHAN) that tackles three unaddressed challenges: (1) learning the semantic knowledge, but doing so by (2) considering the hierarchy (e.g., words/tokens in sentences/statements) and focusing on the important parts (i.e., words, tokens, sentences, and statements) of a GitHub report, while (3) independently extracting information from different types of features, i.e., content, comment, code, command, and label. To evaluate MHNurf, we leverage 3,721 GitHub reports from five DL frameworks for conducting experiments. The results show that MHNurf works the best with a combination of content, comment, and code, which considerably outperforms the classic HAN where only the content is used. MHNurf also produces significantly more accurate results than nine other state-of-the-art classifiers with strong statistical significance, i.e., up to 71% AUC improvement and has the best Scott-Knott rank on four frameworks while 2nd on the remaining one. To facilitate reproduction and promote future research, we have made our dataset, code, and detailed supplementary results publicly available at: https://github.com/ideas-labo/APSEC2022-MHNurf.
CVOct 10, 2022
Improving Visual-Semantic Embeddings by Learning Semantically-Enhanced Hard Negatives for Cross-modal Information RetrievalYan Gong, Georgina Cosma
Visual Semantic Embedding (VSE) aims to extract the semantics of images and their descriptions, and embed them into the same latent space for cross-modal information retrieval. Most existing VSE networks are trained by adopting a hard negatives loss function which learns an objective margin between the similarity of relevant and irrelevant image-description embedding pairs. However, the objective margin in the hard negatives loss function is set as a fixed hyperparameter that ignores the semantic differences of the irrelevant image-description pairs. To address the challenge of measuring the optimal similarities between image-description pairs before obtaining the trained VSE networks, this paper presents a novel approach that comprises two main parts: (1) finds the underlying semantics of image descriptions; and (2) proposes a novel semantically enhanced hard negatives loss function, where the learning objective is dynamically determined based on the optimal similarity scores between irrelevant image-description pairs. Extensive experiments were carried out by integrating the proposed methods into five state-of-the-art VSE networks that were applied to three benchmark datasets for cross-modal information retrieval tasks. The results revealed that the proposed methods achieved the best performance and can also be adopted by existing and future VSE networks.
CVFeb 13, 2023
VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information RetrievalYan Gong, Georgina Cosma, Axel Finke
The relations expressed in user queries are vital for cross-modal information retrieval. Relation-focused cross-modal retrieval aims to retrieve information that corresponds to these relations, enabling effective retrieval across different modalities. Pre-trained networks, such as Contrastive Language-Image Pre-training (CLIP), have gained significant attention and acclaim for their exceptional performance in various cross-modal learning tasks. However, the Vision Transformer (ViT) used in these networks is limited in its ability to focus on image region relations. Specifically, ViT is trained to match images with relevant descriptions at the global level, without considering the alignment between image regions and descriptions. This paper introduces VITR, a novel network that enhances ViT by extracting and reasoning about image region relations based on a local encoder. VITR is comprised of two key components. Firstly, it extends the capabilities of ViT-based cross-modal networks by enabling them to extract and reason with region relations present in images. Secondly, VITR incorporates a fusion module that combines the reasoned results with global knowledge to predict similarity scores between images and descriptions. The proposed VITR network was evaluated through experiments on the tasks of relation-focused cross-modal information retrieval. The results derived from the analysis of the RefCOCOg, CLEVR, and Flickr30K datasets demonstrated that the proposed VITR network consistently outperforms state-of-the-art networks in image-to-text and text-to-image retrieval.
CVJul 21, 2023
Morphological Image Analysis and Feature Extraction for Reasoning with AI-based Defect Detection and Classification ModelsJiajun Zhang, Georgina Cosma, Sarah Bugby et al.
As the use of artificial intelligent (AI) models becomes more prevalent in industries such as engineering and manufacturing, it is essential that these models provide transparent reasoning behind their predictions. This paper proposes the AI-Reasoner, which extracts the morphological characteristics of defects (DefChars) from images and utilises decision trees to reason with the DefChar values. Thereafter, the AI-Reasoner exports visualisations (i.e. charts) and textual explanations to provide insights into outputs made by masked-based defect detection and classification models. It also provides effective mitigation strategies to enhance data pre-processing and overall model performance. The AI-Reasoner was tested on explaining the outputs of an IE Mask R-CNN model using a set of 366 images containing defects. The results demonstrated its effectiveness in explaining the IE Mask R-CNN model's predictions. Overall, the proposed AI-Reasoner provides a solution for improving the performance of AI models in industrial applications that require defect analysis.
IRAug 16, 2023
Advancing continual lifelong learning in neural information retrieval: definition, dataset, framework, and empirical evaluationJingrui Hou, Georgina Cosma, Axel Finke
Continual learning refers to the capability of a machine learning model to learn and adapt to new information, without compromising its performance on previously learned tasks. Although several studies have investigated continual learning methods for information retrieval tasks, a well-defined task formulation is still lacking, and it is unclear how typical learning strategies perform in this context. To address this challenge, a systematic task formulation of continual neural information retrieval is presented, along with a multiple-topic dataset that simulates continuous information retrieval. A comprehensive continual neural information retrieval framework consisting of typical retrieval models and continual learning strategies is then proposed. Empirical evaluations illustrate that the proposed framework can successfully prevent catastrophic forgetting in neural information retrieval and enhance performance on previously learned tasks. The results indicate that embedding-based retrieval models experience a decline in their continual learning performance as the topic shift distance and dataset volume of new tasks increase. In contrast, pretraining-based models do not show any such correlation. Adopting suitable learning strategies can mitigate the effects of topic shift and data augmentation.
AIJul 11, 2024
Unveiling Disparities in Maternity Care: A Topic Modelling Approach to Analysing Maternity Incident Investigation ReportsGeorgina Cosma, Mohit Kumar Singh, Patrick Waterson et al.
This study applies Natural Language Processing techniques, including Latent Dirichlet Allocation, to analyse anonymised maternity incident investigation reports from the Healthcare Safety Investigation Branch. The reports underwent preprocessing, annotation using the Safety Intelligence Research taxonomy, and topic modelling to uncover prevalent topics and detect differences in maternity care across ethnic groups. A combination of offline and online methods was utilised to ensure data protection whilst enabling advanced analysis, with offline processing for sensitive data and online processing for non-sensitive data using the `Claude 3 Opus' language model. Interactive topic analysis and semantic network visualisation were employed to extract and display thematic topics and visualise semantic relationships among keywords. The analysis revealed disparities in care among different ethnic groups, with distinct focus areas for the Black, Asian, and White British ethnic groups. The study demonstrates the effectiveness of topic modelling and NLP techniques in analysing maternity incident investigation reports and highlighting disparities in care. The findings emphasise the crucial role of advanced data analysis in improving maternity care quality and equity.
AIJul 11, 2024
Intelligent Multi-Document Summarisation for Extracting Insights on Racial Inequalities from Maternity Incident Investigation ReportsGeorgina Cosma, Mohit Kumar Singh, Patrick Waterson et al.
In healthcare, thousands of safety incidents occur every year, but learning from these incidents is not effectively aggregated. Analysing incident reports using AI could uncover critical insights to prevent harm by identifying recurring patterns and contributing factors. To aggregate and extract valuable information, natural language processing (NLP) and machine learning techniques can be employed to summarise and mine unstructured data, potentially surfacing systemic issues and priority areas for improvement. This paper presents I-SIRch:CS, a framework designed to facilitate the aggregation and analysis of safety incident reports while ensuring traceability throughout the process. The framework integrates concept annotation using the Safety Intelligence Research (SIRch) taxonomy with clustering, summarisation, and analysis capabilities. Utilising a dataset of 188 anonymised maternity investigation reports annotated with 27 SIRch human factors concepts, I-SIRch:CS groups the annotated sentences into clusters using sentence embeddings and k-means clustering, maintaining traceability via file and sentence IDs. Summaries are generated for each cluster using offline state-of-the-art abstractive summarisation models (BART, DistilBART, T5), which are evaluated and compared using metrics assessing summary quality attributes. The generated summaries are linked back to the original file and sentence IDs, ensuring traceability and allowing for verification of the summarised information. Results demonstrate BART's strengths in creating informative and concise summaries.
IROct 10, 2023
Efficient Retrieval of Images with Irregular Patterns using Morphological Image Analysis: Applications to Industrial and Healthcare datasetsJiajun Zhang, Georgina Cosma, Sarah Bugby et al.
Image retrieval is the process of searching and retrieving images from a database based on their visual content and features. Recently, much attention has been directed towards the retrieval of irregular patterns within industrial or medical images by extracting features from the images, such as deep features, colour-based features, shape-based features and local features. This has applications across a spectrum of industries, including fault inspection, disease diagnosis, and maintenance prediction. This paper proposes an image retrieval framework to search for images containing similar irregular patterns by extracting a set of morphological features (DefChars) from images; the datasets employed in this paper contain wind turbine blade images with defects, chest computerised tomography scans with COVID-19 infection, heatsink images with defects, and lake ice images. The proposed framework was evaluated with different feature extraction methods (DefChars, resized raw image, local binary pattern, and scale-invariant feature transforms) and distance metrics to determine the most efficient parameters in terms of retrieval performance across datasets. The retrieval results show that the proposed framework using the DefChars and the Manhattan distance metric achieves a mean average precision of 80% and a low standard deviation of 0.09 across classes of irregular patterns, outperforming alternative feature-metric combinations across all datasets. Furthermore, the low standard deviation between each class highlights DefChars' capability for a reliable image retrieval task, even in the presence of class imbalances or small-sized datasets.
LGJul 13, 2023
Identifying Early Help Referrals For Local Authorities With Machine Learning And Bias AnalysisEufrásio de A. Lima Neto, Jonathan Bailiss, Axel Finke et al.
Local authorities in England, such as Leicestershire County Council (LCC), provide Early Help services that can be offered at any point in a young person's life when they experience difficulties that cannot be supported by universal services alone, such as schools. This paper investigates the utilisation of machine learning (ML) to assist experts in identifying families that may need to be referred for Early Help assessment and support. LCC provided an anonymised dataset comprising 14360 records of young people under the age of 18. The dataset was pre-processed, machine learning models were build, and experiments were conducted to validate and test the performance of the models. Bias mitigation techniques were applied to improve the fairness of these models. During testing, while the models demonstrated the capability to identify young people requiring intervention or early help, they also produced a significant number of false positives, especially when constructed with imbalanced data, incorrectly identifying individuals who most likely did not need an Early Help referral. This paper empirically explores the suitability of data-driven ML models for identifying young people who may require Early Help services and discusses their appropriateness and limitations for this task.
AIJul 25, 2023
ForestMonkey: Toolkit for Reasoning with AI-based Defect Detection and Classification ModelsJiajun Zhang, Georgina Cosma, Sarah Bugby et al.
Artificial intelligence (AI) reasoning and explainable AI (XAI) tasks have gained popularity recently, enabling users to explain the predictions or decision processes of AI models. This paper introduces Forest Monkey (FM), a toolkit designed to reason the outputs of any AI-based defect detection and/or classification model with data explainability. Implemented as a Python package, FM takes input in the form of dataset folder paths (including original images, ground truth labels, and predicted labels) and provides a set of charts and a text file to illustrate the reasoning results and suggest possible improvements. The FM toolkit consists of processes such as feature extraction from predictions to reasoning targets, feature extraction from images to defect characteristics, and a decision tree-based AI-Reasoner. Additionally, this paper investigates the time performance of the FM toolkit when applied to four AI models with different datasets. Lastly, a tutorial is provided to guide users in performing reasoning tasks using the FM toolkit.
LGApr 20, 2023
A Meta-heuristic Approach to Estimate and Explain Classifier UncertaintyAndrew Houston, Georgina Cosma
Trust is a crucial factor affecting the adoption of machine learning (ML) models. Qualitative studies have revealed that end-users, particularly in the medical domain, need models that can express their uncertainty in decision-making allowing users to know when to ignore the model's recommendations. However, existing approaches for quantifying decision-making uncertainty are not model-agnostic, or they rely on complex statistical derivations that are not easily understood by laypersons or end-users, making them less useful for explaining the model's decision-making process. This work proposes a set of class-independent meta-heuristics that can characterize the complexity of an instance in terms of factors are mutually relevant to both human and ML decision-making. The measures are integrated into a meta-learning framework that estimates the risk of misclassification. The proposed framework outperformed predicted probabilities in identifying instances at risk of being misclassified. The proposed measures and framework hold promise for improving model development for more complex instances, as well as providing a new means of model abstention and explanation.
IRAug 9, 2024
Neural Machine UnrankingJingrui Hou, Axel Finke, Georgina Cosma
We address the problem of machine unlearning in neural information retrieval (IR), introducing a novel task termed Neural Machine UnRanking (NuMuR). This problem is motivated by growing demands for data privacy compliance and selective information removal in neural IR systems. Existing task- or model- agnostic unlearning approaches, primarily designed for classification tasks, are suboptimal for NuMuR due to two core challenges: (1) neural rankers output unnormalised relevance scores rather than probability distributions, limiting the effectiveness of traditional teacher-student distillation frameworks; and (2) entangled data scenarios, where queries and documents appear simultaneously across both forget and retain sets, may degrade retention performance in existing methods. To address these issues, we propose Contrastive and Consistent Loss (CoCoL), a dual-objective framework. CoCoL comprises (1) a contrastive loss that reduces relevance scores on forget sets while maintaining performance on entangled samples, and (2) a consistent loss that preserves accuracy on retain set. Extensive experiments on MS MARCO and TREC CAR datasets, across four neural IR models, demonstrate that CoCoL achieves substantial forgetting with minimal retain and generalisation performance loss. Our method facilitates more effective and controllable data removal than existing techniques.
LGOct 30, 2025
On the limitation of evaluating machine unlearning using only a single training seedJamie Lanyon, Axel Finke, Petros Andreou et al.
Machine unlearning (MU) aims to remove the influence of certain data points from a trained model without costly retraining. Most practical MU algorithms are only approximate and their performance can only be assessed empirically. Care must therefore be taken to make empirical comparisons as representative as possible. A common practice is to run the MU algorithm multiple times independently starting from the same trained model. In this work, we demonstrate that this practice can give highly non-representative results because -- even for the same architecture and same dataset -- some MU methods can be highly sensitive to the choice of random number seed used for model training. We therefore recommend that empirical comparisons of MU algorithms should also reflect the variability across different model training seeds.
IRJul 29, 2024
FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval for comparative performance analysisMikel Williams-Lekuona, Georgina Cosma
In the field of Image-Text Retrieval (ITR), recent advancements have leveraged large-scale Vision-Language Pretraining (VLP) for Fine-Grained (FG) instance-level retrieval, achieving high accuracy at the cost of increased computational complexity. For Coarse-Grained (CG) category-level retrieval, prominent approaches employ Cross-Modal Hashing (CMH) to prioritise efficiency, albeit at the cost of retrieval performance. Due to differences in methodologies, FG and CG models are rarely compared directly within evaluations in the literature, resulting in a lack of empirical data quantifying the retrieval performance-efficiency tradeoffs between the two. This paper addresses this gap by introducing the FiCo-ITR library, which standardises evaluation methodologies for both FG and CG models, facilitating direct comparisons. We conduct empirical evaluations of representative models from both subfields, analysing precision, recall, and computational complexity across varying data scales. Our findings offer new insights into the performance-efficiency trade-offs between recent representative FG and CG models, highlighting their respective strengths and limitations. These findings provide the foundation necessary to make more informed decisions regarding model selection for specific retrieval tasks and highlight avenues for future research into hybrid systems that leverage the strengths of both FG and CG approaches.
IRNov 13, 2024
Neural Corrective Machine UnrankingJingrui Hou, Axel Finke, Georgina Cosma
Machine unlearning in neural information retrieval (IR) systems requires removing specific data whilst maintaining model performance. Applying existing machine unlearning methods to IR may compromise retrieval effectiveness or inadvertently expose unlearning actions due to the removal of particular items from the retrieved results presented to users. We formalise corrective unranking, which extends machine unlearning in (neural) IR context by integrating substitute documents to preserve ranking integrity, and propose a novel teacher-student framework, Corrective unRanking Distillation (CuRD), for this task. CuRD (1) facilitates forgetting by adjusting the (trained) neural IR model such that its output relevance scores of to-be-forgotten samples mimic those of low-ranking, non-retrievable samples; (2) enables correction by fine-tuning the relevance scores for the substitute samples to match those of corresponding to-be-forgotten samples closely; (3) seeks to preserve performance on samples that are not targeted for forgetting. We evaluate CuRD on four neural IR models (BERTcat, BERTdot, ColBERT, PARADE) using MS MARCO and TREC CAR datasets. Experiments with forget set sizes from 1 % and 20 % of the training dataset demonstrate that CuRD outperforms seven state-of-the-art baselines in terms of forgetting and correction while maintaining model retention and generalisation capabilities.
CYOct 28, 2024
Temporal Patterns of Multiple Long-Term Conditions in Individuals with Intellectual Disability Living in Wales: An Unsupervised Clustering Approach to Disease TrajectoriesRania Kousovista, Georgina Cosma, Emeka Abakasanga et al.
Identifying and understanding the co-occurrence of multiple long-term conditions (MLTC) in individuals with intellectual disabilities (ID) is vital for effective healthcare management. These individuals often face earlier onset and higher prevalence of MLTCs, yet specific co-occurrence patterns remain unexplored. This study applies an unsupervised approach to characterise MLTC clusters based on shared disease trajectories using electronic health records (EHRs) from 13069 individuals with ID in Wales (2000-2021). Disease associations and temporal directionality were assessed, followed by spectral clustering to group shared trajectories. The population consisted of 52.3% males and 47.7% females, with an average of 4.5 conditions per patient. Males under 45 formed a single cluster dominated by neurological conditions (32.4%), while males above 45 had three clusters, the largest characterised circulatory (51.8%). Females under 45 formed one cluster with digestive conditions (24.6%) as most prevalent, while those aged 45 and older showed two clusters: one dominated by circulatory (34.1%), and the other by digestive (25.9%) and musculoskeletal (21.9%) system conditions. Mental illness, epilepsy, and reflux were common across groups. These clusters offer insights into disease progression in individuals with ID, informing targeted interventions and personalised healthcare strategies.
SDJan 26
A Framework for Evaluating Faithfulness in Explainable AI for Machine Anomalous Sound Detection Using Frequency-Band PerturbationAlexander Buck, Georgina Cosma, Iain Phillips et al.
Explainable AI (XAI) is commonly applied to anomalous sound detection (ASD) models to identify which time-frequency regions of an audio signal contribute to an anomaly decision. However, most audio explanations rely on qualitative inspection of saliency maps, leaving open the question of whether these attributions accurately reflect the spectral cues the model uses. In this work, we introduce a new quantitative framework for evaluating XAI faithfulness in machine-sound analysis by directly linking attribution relevance to model behaviour through systematic frequency-band removal. This approach provides an objective measure of whether an XAI method for machine ASD correctly identifies frequency regions that influence an ASD model's predictions. By using four widely adopted methods, namely Integrated Gradients, Occlusion, Grad-CAM and SmoothGrad, we show that XAI techniques differ in reliability, with Occlusion demonstrating the strongest alignment with true model sensitivity and gradient-+based methods often failing to accurately capture spectral dependencies. The proposed framework offers a reproducible way to benchmark audio explanations and enables more trustworthy interpretation of spectrogram-based ASD systems.
IRDec 17, 2025
Image Complexity-Aware Adaptive Retrieval for Efficient Vision-Language ModelsMikel Williams-Lekuona, Georgina Cosma
Vision transformers in vision-language models typically use the same amount of compute for every image, regardless of whether it is simple or complex. We propose ICAR (Image Complexity-Aware Retrieval), an adaptive computation approach that enables vision transformers to use less compute for simple images whilst processing complex images through their full network depth. The key challenge is maintaining cross-modal alignment: embeddings from different processing depths must remain compatible for text matching. ICAR solves this through dual-path training that produces compatible embeddings from both the early-exit and full-depth paths. This maintains compatibility between image representations and text embeddings in the same semantic space, whether an image exits early or processes fully. Unlike existing two-stage approaches that require expensive reranking, ICAR enables direct image-text matching without additional overhead. To determine how much compute to use, we develop ConvNeXt-IC, which treats image complexity assessment as a classification task. By applying modern classifier backbones rather than specialised architectures, ConvNeXt-IC achieves state-of-the-art performance, attaining a Pearson correlation coefficient of 0.959 with human labelling whilst delivering 4.4x faster complexity prediction. Evaluated on standard benchmarks augmented with real-world web data, ICAR achieves 20% faster image encoding while maintaining category-level performance and 95% of instance-level performance, enabling sustainable scaling of vision-language systems.
LGNov 3, 2024
Equitable Length of Stay Prediction for Patients with Learning Disabilities and Multiple Long-term Conditions Using Machine LearningEmeka Abakasanga, Rania Kousovista, Georgina Cosma et al.
People with learning disabilities have a higher mortality rate and premature deaths compared to the general public, as reported in published research in the UK and other countries. This study analyses hospitalisations of 9,618 patients identified with learning disabilities and long-term conditions for the population of Wales using electronic health record (EHR) data sources from the SAIL Databank. We describe the demographic characteristics, prevalence of long-term conditions, medication history, hospital visits, and lifestyle history for our study cohort, and apply machine learning models to predict the length of hospital stays for this cohort. The random forest (RF) model achieved an Area Under the Curve (AUC) of 0.759 (males) and 0.756 (females), a false negative rate of 0.224 (males) and 0.229 (females), and a balanced accuracy of 0.690 (males) and 0.689 (females). After examining model performance across ethnic groups, two bias mitigation algorithms (threshold optimization and the reductions algorithm using an exponentiated gradient) were applied to minimise performance discrepancies. The threshold optimizer algorithm outperformed the reductions algorithm, achieving lower ranges in false positive rate and balanced accuracy for the male cohort across the ethnic groups. This study demonstrates the potential of applying machine learning models with effective bias mitigation approaches on EHR data sources to enable equitable prediction of hospital stays by addressing data imbalances across groups.
IRJun 8, 2024
I-SIRch: AI-Powered Concept Annotation Tool For Equitable Extraction And Analysis Of Safety Insights From Maternity InvestigationsMohit Kumar Singh, Georgina Cosma, Patrick Waterson et al.
Maternity care is a complex system involving treatments and interactions between patients, providers, and the care environment. To improve patient safety and outcomes, understanding the human factors (e.g. individuals decisions, local facilities) influencing healthcare delivery is crucial. However, most current tools for analysing healthcare data focus only on biomedical concepts (e.g. health conditions, procedures and tests), overlooking the importance of human factors. We developed a new approach called I-SIRch, using artificial intelligence to automatically identify and label human factors concepts in maternity healthcare investigation reports describing adverse maternity incidents produced by England's Healthcare Safety Investigation Branch (HSIB). These incident investigation reports aim to identify opportunities for learning and improving maternal safety across the entire healthcare system. I-SIRch was trained using real data and tested on both real and simulated data to evaluate its performance in identifying human factors concepts. When applied to real reports, the model achieved a high level of accuracy, correctly identifying relevant concepts in 90\% of the sentences from 97 reports. Applying I-SIRch to analyse these reports revealed that certain human factors disproportionately affected mothers from different ethnic groups. Our work demonstrates the potential of using automated tools to identify human factors concepts in maternity incident investigation reports, rather than focusing solely on biomedical concepts. This approach opens up new possibilities for understanding the complex interplay between social, technical, and organisational factors influencing maternal safety and population health outcomes. By taking a more comprehensive view of maternal healthcare delivery, we can develop targeted interventions to address disparities and improve maternal outcomes.
NESep 18, 2019
NatCSNN: A Convolutional Spiking Neural Network for recognition of objects extracted from natural imagesPedro Machado, Georgina Cosma, T. M McGinnity
Biological image processing is performed by complex neural networks composed of thousands of neurons interconnected via thousands of synapses, some of which are excitatory and others inhibitory. Spiking neural models are distinguished from classical neurons by being biological plausible and exhibiting the same dynamics as those observed in biological neurons. This paper proposes a Natural Convolutional Neural Network (NatCSNN) which is a 3-layer bio-inspired Convolutional Spiking Neural Network (CSNN), for classifying objects extracted from natural images. A two-stage training algorithm is proposed using unsupervised Spike Timing Dependent Plasticity (STDP) learning (phase 1) and ReSuMe supervised learning (phase 2). The NatCSNN was trained and tested on the CIFAR-10 dataset and achieved an average testing accuracy of 84.7% which is an improvement over the 2-layer neural networks previously applied to this dataset.