Nirmalie Wiratunga

AI
h-index27
16papers
205citations
Novelty42%
AI Score43

16 Papers

LGDec 5, 2022
This changes to that : Combining causal and non-causal explanations to generate disease progression in capsule endoscopy

Anuja Vats, Ahmed Mohammed, Marius Pedersen et al.

Due to the unequivocal need for understanding the decision processes of deep learning networks, both modal-dependent and model-agnostic techniques have become very popular. Although both of these ideas provide transparency for automated decision making, most methodologies focus on either using the modal-gradients (model-dependent) or ignoring the model internal states and reasoning with a model's behavior/outcome (model-agnostic) to instances. In this work, we propose a unified explanation approach that given an instance combines both model-dependent and agnostic explanations to produce an explanation set. The generated explanations are not only consistent in the neighborhood of a sample but can highlight causal relationships between image content and the outcome. We use Wireless Capsule Endoscopy (WCE) domain to illustrate the effectiveness of our explanations. The saliency maps generated by our approach are comparable or better on the softmax information score.

AINov 11, 2022
Behaviour Trees for Creating Conversational Explanation Experiences

Anjana Wijekoon, David Corsar, Nirmalie Wiratunga

This paper presented an XAI system specification and an interactive dialogue model to facilitate the creation of Explanation Experiences (EE). Such specifications combine the knowledge of XAI, domain and system experts of a use case to formalise target user groups and their explanation needs and to implement explanation strategies to address those needs. Formalising the XAI system promotes the reuse of existing explainers and known explanation needs that can be refined and evolved over time using user evaluation feedback. The abstract EE dialogue model formalised the interactions between a user and an XAI system. The resulting EE conversational chatbot is personalised to an XAI system at run-time using the knowledge captured in its XAI system specification. This seamless integration is enabled by using Behaviour Trees (BT) to conceptualise both the EE dialogue model and the explanation strategies. In the evaluation, we discussed several desirable properties of using BTs over traditionally used STMs or FSMs. BTs promote the reusability of dialogue components through the hierarchical nature of the design. Sub-trees are modular, i.e. a sub-tree is responsible for a specific behaviour, which can be designed in different levels of granularity to improve human interpretability. The EE dialogue model consists of abstract behaviours needed to capture EE, accordingly, it can be implemented as a conversational, graphical or text-based interface which caters to different domains and users. There is a significant computational cost when using BTs for modelling dialogue, which we mitigate by using memory. Overall, we find that the ability to create robust conversational pathways dynamically makes BTs a good candidate for designing and implementing conversation for creating explanation experiences.

CLMay 26, 2022
Clinical Dialogue Transcription Error Correction using Seq2Seq Models

Gayani Nanayakkara, Nirmalie Wiratunga, David Corsar et al.

Good communication is critical to good healthcare. Clinical dialogue is a conversation between health practitioners and their patients, with the explicit goal of obtaining and sharing medical information. This information contributes to medical decision-making regarding the patient and plays a crucial role in their healthcare journey. The reliance on note taking and manual scribing processes are extremely inefficient and leads to manual transcription errors when digitizing notes. Automatic Speech Recognition (ASR) plays a significant role in speech-to-text applications, and can be directly used as a text generator in conversational applications. However, recording clinical dialogue presents a number of general and domain-specific challenges. In this paper, we present a seq2seq learning approach for ASR transcription error correction of clinical dialogues. We introduce a new Gastrointestinal Clinical Dialogue (GCD) Dataset which was gathered by healthcare professionals from a NHS Inflammatory Bowel Disease clinic and use this in a comparative study with four commercial ASR systems. Using self-supervision strategies, we fine-tune a seq2seq model on a mask-filling task using a domain-specific PubMed dataset which we have shared publicly for future research. The BART model fine-tuned for mask-filling was able to correct transcription errors and achieve lower word error rates for three out of four commercial ASR outputs.

LGJul 17, 2024
A Survey of AI-Powered Mini-Grid Solutions for a Sustainable Future in Rural Communities

Craig Pirie, Harsha Kalutarage, Muhammad Shadi Hajar et al.

This paper presents a comprehensive survey of AI-driven mini-grid solutions aimed at enhancing sustainable energy access. It emphasises the potential of mini-grids, which can operate independently or in conjunction with national power grids, to provide reliable and affordable electricity to remote communities. Given the inherent unpredictability of renewable energy sources such as solar and wind, the necessity for accurate energy forecasting and management is discussed, highlighting the role of advanced AI techniques in forecasting energy supply and demand, optimising grid operations, and ensuring sustainable energy distribution. This paper reviews various forecasting models, including statistical methods, machine learning algorithms, and hybrid approaches, evaluating their effectiveness for both short-term and long-term predictions. Additionally, it explores public datasets and tools such as Prophet, NeuralProphet, and N-BEATS for model implementation and validation. The survey concludes with recommendations for future research, addressing challenges in model adaptation and optimisation for real-world applications.

AIJul 15, 2024
XEQ Scale for Evaluating XAI Experience Quality

Anjana Wijekoon, Nirmalie Wiratunga, David Corsar et al.

Explainable Artificial Intelligence (XAI) aims to improve the transparency of autonomous decision-making through explanations. Recent literature has emphasised users' need for holistic "multi-shot" explanations and personalised engagement with XAI systems. We refer to this user-centred interaction as an XAI Experience. Despite advances in creating XAI experiences, evaluating them in a user-centred manner has remained challenging. In response, we developed the XAI Experience Quality (XEQ) Scale. XEQ quantifies the quality of experiences across four dimensions: learning, utility, fulfilment and engagement. These contributions extend the state-of-the-art of XAI evaluation, moving beyond the one-dimensional metrics frequently developed to assess single-shot explanations. This paper presents the XEQ scale development and validation process, including content validation with XAI experts, and discriminant and construct validation through a large-scale pilot study. Our pilot study results offer strong evidence that establishes the XEQ Scale as a comprehensive framework for evaluating user-centred XAI experiences.

AIOct 3, 2023
Towards Feasible Counterfactual Explanations: A Taxonomy Guided Template-based NLG Method

Pedram Salimi, Nirmalie Wiratunga, David Corsar et al.

Counterfactual Explanations (cf-XAI) describe the smallest changes in feature values necessary to change an outcome from one class to another. However, many cf-XAI methods neglect the feasibility of those changes. In this paper, we introduce a novel approach for presenting cf-XAI in natural language (Natural-XAI), giving careful consideration to actionable and comprehensible aspects while remaining cognizant of immutability and ethical concerns. We present three contributions to this endeavor. Firstly, through a user study, we identify two types of themes present in cf-XAI composed by humans: content-related, focusing on how features and their values are included from both the counterfactual and the query perspectives; and structure-related, focusing on the structure and terminology used for describing necessary value changes. Secondly, we introduce a feature actionability taxonomy with four clearly defined categories, to streamline the explanation presentation process. Using insights from the user study and our taxonomy, we created a generalisable template-based natural language generation (NLG) method compatible with existing explainers like DICE, NICE, and DisCERN, to produce counterfactuals that address the aforementioned limitations of existing approaches. Finally, we conducted a second user study to assess the performance of our taxonomy-guided NLG templates on three domains. Our findings show that the taxonomy-guided Natural-XAI approach (n-XAI^T) received higher user ratings across all dimensions, with significantly improved results in the majority of the domains assessed for articulation, acceptability, feasibility, and sensitivity dimensions.

DCSep 8, 2024
FedFT: Improving Communication Performance for Federated Learning with Frequency Space Transformation

Chamath Palihawadana, Nirmalie Wiratunga, Anjana Wijekoon et al.

Communication efficiency is a widely recognised research problem in Federated Learning (FL), with recent work focused on developing techniques for efficient compression, distribution and aggregation of model parameters between clients and the server. Particularly within distributed systems, it is important to balance the need for computational cost and communication efficiency. However, existing methods are often constrained to specific applications and are less generalisable. In this paper, we introduce FedFT (federated frequency-space transformation), a simple yet effective methodology for communicating model parameters in a FL setting. FedFT uses Discrete Cosine Transform (DCT) to represent model parameters in frequency space, enabling efficient compression and reducing communication overhead. FedFT is compatible with various existing FL methodologies and neural architectures, and its linear property eliminates the need for multiple transformations during federated aggregation. This methodology is vital for distributed solutions, tackling essential challenges like data privacy, interoperability, and energy efficiency inherent to these environments. We demonstrate the generalisability of the FedFT methodology on four datasets using comparative studies with three state-of-the-art FL baselines (FedAvg, FedProx, FedSim). Our results demonstrate that using FedFT to represent the differences in model parameters between communication rounds in frequency space results in a more compact representation compared to representing the entire model in frequency space. This leads to a reduction in communication overhead, while keeping accuracy levels comparable and in some cases even improving it. Our results suggest that this reduction can range from 5% to 30% per client, depending on dataset.

AIAug 23, 2024
iSee: Advancing Multi-Shot Explainable AI Using Case-based Recommendations

Anjana Wijekoon, Nirmalie Wiratunga, David Corsar et al.

Explainable AI (XAI) can greatly enhance user trust and satisfaction in AI-assisted decision-making processes. Recent findings suggest that a single explainer may not meet the diverse needs of multiple users in an AI system; indeed, even individual users may require multiple explanations. This highlights the necessity for a "multi-shot" approach, employing a combination of explainers to form what we introduce as an "explanation strategy". Tailored to a specific user or a user group, an "explanation experience" describes interactions with personalised strategies designed to enhance their AI decision-making processes. The iSee platform is designed for the intelligent sharing and reuse of explanation experiences, using Case-based Reasoning to advance best practices in XAI. The platform provides tools that enable AI system designers, i.e. design users, to design and iteratively revise the most suitable explanation strategy for their AI system to satisfy end-user needs. All knowledge generated within the iSee platform is formalised by the iSee ontology for interoperability. We use a summative mixed methods study protocol to evaluate the usability and utility of the iSee platform with six design users across varying levels of AI and XAI expertise. Our findings confirm that the iSee platform effectively generalises across applications and its potential to promote the adoption of XAI best practices.

21.2IRMay 15
RAPT: Retrieval-Augmented Post-hoc Thresholding for Multi-Label Classification

Lasal Jayawardena, Nirmalie Wiratunga, Ikechukwu Nkisi-Orji et al.

Industrial multi-label document understanding pipelines score candidate labels and threshold or rank them to form a label set per document. This early selection step directly affects the accuracy of downstream information extraction from the document, as well as the associated verification effort. In practice, OCR noise, label imbalance, instance-dependent label cardinality, and asymmetric error costs make global score thresholds brittle and hard to maintain as document formats evolve. We present RAPT, a deployment-oriented retrieval-augmented score thresholding wrapper, applied post-hoc to improve label set selection without retraining the underlying classifier. RAPT is a model-agnostic wrapper: any predictor that provides document representations for similarity search and per label confidence scores can be used, including metric learning encoders and fine-tuned transformer classifiers. For each query document, given a classifier's score vector, RAPT retrieves similar document thresholding situations (cases) and adapts the query's label set selection threshold using their outcomes. The adaptation selects the final label set by locally aggregating neighbour solutions (e.g. average label count, cutoff calibration). Evaluation compared multi-label classifiers (metric learners and transformers) combined with RAPT against global and label-wise thresholding baselines, and against few-shot LLMs. Across an industrial dataset and six public benchmarks, RAPT consistently outperformed global and label-wise static thresholding baselines. In the industrial setting, RAPT achieved its best predictive performance with metric learners, reaching 0.87 Macro-F1, while fine-tuned transformer variants on average achieved 0.775 Macro-F1, outperforming fewshot LLM baselines (K = 5) by 2x and requiring at least 115x less inference time and 13.5x less GPU memory.

CLApr 4, 2024
CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering

Nirmalie Wiratunga, Ramitha Abeyratne, Lasal Jayawardena et al.

Retrieval-Augmented Generation (RAG) enhances Large Language Model (LLM) output by providing prior knowledge as context to input. This is beneficial for knowledge-intensive and expert reliant tasks, including legal question-answering, which require evidence to validate generated text outputs. We highlight that Case-Based Reasoning (CBR) presents key opportunities to structure retrieval as part of the RAG process in an LLM. We introduce CBR-RAG, where CBR cycle's initial retrieval stage, its indexing vocabulary, and similarity knowledge containers are used to enhance LLM queries with contextually relevant cases. This integration augments the original LLM query, providing a richer prompt. We present an evaluation of CBR-RAG, and examine different representations (i.e. general and domain-specific embeddings) and methods of comparison (i.e. inter, intra and hybrid similarity) on the task of legal question-answering. Our results indicate that the context provided by CBR's case reuse enforces similarity between relevant components of the questions and the evidence base leading to significant improvements in the quality of generated answers.

HCMay 16, 2024
Tell me more: Intent Fulfilment Framework for Enhancing User Experiences in Conversational XAI

Anjana Wijekoon, David Corsar, Nirmalie Wiratunga et al.

The evolution of Explainable Artificial Intelligence (XAI) has emphasised the significance of meeting diverse user needs. The approaches to identifying and addressing these needs must also advance, recognising that explanation experiences are subjective, user-centred processes that interact with users towards a better understanding of AI decision-making. This paper delves into the interrelations in multi-faceted XAI and examines how different types of explanations collaboratively meet users' XAI needs. We introduce the Intent Fulfilment Framework (IFF) for creating explanation experiences. The novelty of this paper lies in recognising the importance of "follow-up" on explanations for obtaining clarity, verification and/or substitution. Moreover, the Explanation Experience Dialogue Model integrates the IFF and "Explanation Followups" to provide users with a conversational interface for exploring their explanation needs, thereby creating explanation experiences. Quantitative and qualitative findings from our comparative user study demonstrate the impact of the IFF in improving user engagement, the utility of the AI system and the overall user experience. Overall, we reinforce the principle that "one explanation does not fit all" to create explanation experiences that guide the complex interaction through conversation.

CLSep 4, 2025
Cross-Layer Attention Probing for Fine-Grained Hallucination Detection

Malavika Suresh, Rahaf Aljundi, Ikechukwu Nkisi-Orji et al.

With the large-scale adoption of Large Language Models (LLMs) in various applications, there is a growing reliability concern due to their tendency to generate inaccurate text, i.e. hallucinations. In this work, we propose Cross-Layer Attention Probing (CLAP), a novel activation probing technique for hallucination detection, which processes the LLM activations across the entire residual stream as a joint sequence. Our empirical evaluations using five LLMs and three tasks show that CLAP improves hallucination detection compared to baselines on both greedy decoded responses as well as responses sampled at higher temperatures, thus enabling fine-grained detection, i.e. the ability to disambiguate hallucinations and non-hallucinations among different sampled responses to a given prompt. This allows us to propose a detect-then-mitigate strategy using CLAP to reduce hallucinations and improve LLM reliability compared to direct mitigation approaches. Finally, we show that CLAP maintains high reliability even when applied out-of-distribution.

LGSep 13, 2021
DisCERN:Discovering Counterfactual Explanations using Relevance Features from Neighbourhoods

Nirmalie Wiratunga, Anjana Wijekoon, Ikechukwu Nkisi-Orji et al.

Counterfactual explanations focus on "actionable knowledge" to help end-users understand how a machine learning outcome could be changed to a more desirable outcome. For this purpose a counterfactual explainer needs to discover input dependencies that relate to outcome changes. Identifying the minimum subset of feature changes needed to action an output change in the decision is an interesting challenge for counterfactual explainers. The DisCERN algorithm introduced in this paper is a case-based counter-factual explainer. Here counterfactuals are formed by replacing feature values from a nearest unlike neighbour (NUN) until an actionable change is observed. We show how widely adopted feature relevance-based explainers (i.e. LIME, SHAP), can inform DisCERN to identify the minimum subset of "actionable features". We demonstrate our DisCERN algorithm on five datasets in a comparative study with the widely used optimisation-based counterfactual approach DiCE. Our results demonstrate that DisCERN is an effective strategy to minimise actionable changes necessary to create good counterfactual explanations.

CVJun 12, 2020
Learning-to-Learn Personalised Human Activity Recognition Models

Anjana Wijekoon, Nirmalie Wiratunga

Human Activity Recognition~(HAR) is the classification of human movement, captured using one or more sensors either as wearables or embedded in the environment~(e.g. depth cameras, pressure mats). State-of-the-art methods of HAR rely on having access to a considerable amount of labelled data to train deep architectures with many train-able parameters. This becomes prohibitive when tasked with creating models that are sensitive to personal nuances in human movement, explicitly present when performing exercises. In addition, it is not possible to collect training data to cover all possible subjects in the target population. Accordingly, learning personalised models with few data remains an interesting challenge for HAR research. We present a meta-learning methodology for learning to learn personalised HAR models for HAR; with the expectation that the end-user need only provides a few labelled data but can benefit from the rapid adaptation of a generic meta-model. We introduce two algorithms, Personalised MAML and Personalised Relation Networks inspired by existing Meta-Learning algorithms but optimised for learning HAR models that are adaptable to any person in health and well-being applications. A comparative study shows significant performance improvements against the state-of-the-art Deep Learning algorithms and the Few-shot Meta-Learning algorithms in multiple HAR domains.

HCApr 29, 2020
FitChat: Conversational Artificial Intelligence Interventions for Encouraging Physical Activity in Older Adults

Nirmalie Wiratunga, Kay Cooper, Anjana Wijekoon et al.

Delivery of digital behaviour change interventions which encourage physical activity has been tried in many forms. Most often interventions are delivered as text notifications, but these do not promote interaction. Advances in conversational AI have improved natural language understanding and generation, allowing AI chatbots to provide an engaging experience with the user. For this reason, chatbots have recently been seen in healthcare delivering digital interventions through free text or choice selection. In this work, we explore the use of voice-based AI chatbots as a novel mode of intervention delivery, specifically targeting older adults to encourage physical activity. We co-created "FitChat", an AI chatbot, with older adults and we evaluate the first prototype using Think Aloud Sessions. Our thematic evaluation suggests that older adults prefer voice-based chat over text notifications or free text entry and that voice is a powerful mode for encouraging motivation.

CVAug 13, 2019
MEx: Multi-modal Exercises Dataset for Human Activity Recognition

Anjana Wijekoon, Nirmalie Wiratunga, Kay Cooper

MEx: Multi-modal Exercises Dataset is a multi-sensor, multi-modal dataset, implemented to benchmark Human Activity Recognition(HAR) and Multi-modal Fusion algorithms. Collection of this dataset was inspired by the need for recognising and evaluating quality of exercise performance to support patients with Musculoskeletal Disorders(MSD). We select 7 exercises regularly recommended for MSD patients by physiotherapists and collected data with four sensors a pressure mat, a depth camera and two accelerometers. The dataset contains three data modalities; numerical time-series data, video data and pressure sensor data posing interesting research challenges when reasoning for HAR and Exercise Quality Assessment. This paper presents our evaluation of the dataset on number of standard classification algorithms for the HAR task by comparing different feature representation algorithms for each sensor. These results set a reference performance for each individual sensor that expose their strengths and weaknesses for the future tasks. In addition we visualise pressure mat data to explore the potential of the sensor to capture exercise performance quality. With the recent advancement in multi-modal fusion, we also believe MEx is a suitable dataset to benchmark not only HAR algorithms, but also fusion algorithms of heterogeneous data types in multiple application domains.