53.2LGMar 11Code
CUPID: A Plug-in Framework for Joint Aleatoric and Epistemic Uncertainty Estimation with a Single ModelXinran Xu, Xiuyi Fan
Accurate estimation of uncertainty in deep learning is critical for deploying models in high-stakes domains such as medical diagnosis and autonomous decision-making, where overconfident predictions can lead to harmful outcomes. In practice, understanding the reason behind a model's uncertainty and the type of uncertainty it represents can support risk-aware decisions, enhance user trust, and guide additional data collection. However, many existing methods only address a single type of uncertainty or require modifications and retraining of the base model, making them difficult to adopt in real-world systems. We introduce CUPID (Comprehensive Uncertainty Plug-in estImation moDel), a general-purpose module that jointly estimates aleatoric and epistemic uncertainty without modifying or retraining the base model. CUPID can be flexibly inserted into any layer of a pretrained network. It models aleatoric uncertainty through a learned Bayesian identity mapping and captures epistemic uncertainty by analyzing the model's internal responses to structured perturbations. We evaluate CUPID across a range of tasks, including classification, regression, and out-of-distribution detection. The results show that it consistently delivers competitive performance while offering layer-wise insights into the origins of uncertainty. By making uncertainty estimation modular, interpretable, and model-agnostic, CUPID supports more transparent and trustworthy AI. Related code and data are available at https://github.com/a-Fomalhaut-a/CUPID.
AISep 1, 2022
Probabilistic Deduction: an Approach to Probabilistic Structured ArgumentationXiuyi Fan
This paper introduces Probabilistic Deduction (PD) as an approach to probabilistic structured argumentation. A PD framework is composed of probabilistic rules (p-rules). As rules in classical structured argumentation frameworks, p-rules form deduction systems. In addition, p-rules also represent conditional probabilities that define joint probability distributions. With PD frameworks, one performs probabilistic reasoning by solving Rule-Probabilistic Satisfiability. At the same time, one can obtain an argumentative reading to the probabilistic reasoning with arguments and attacks. In this work, we introduce a probabilistic version of the Closed-World Assumption (P-CWA) and prove that our probabilistic approach coincides with the complete extension in classical argumentation under P-CWA and with maximum entropy reasoning. We present several approaches to compute the joint probability distribution from p-rules for achieving a practical proof theory for PD. PD provides a framework to unify probabilistic reasoning with argumentative reasoning. This is the first work in probabilistic structured argumentation where the joint distribution is not assumed form external sources.
AIMar 23, 2022
On Understanding the Influence of Controllable Factors with a Feature Attribution Algorithm: a Medical Case StudyVeera Raghava Reddy Kovvuri, Siyuan Liu, Monika Seisenberger et al.
Feature attribution XAI algorithms enable their users to gain insight into the underlying patterns of large datasets through their feature importance calculation. Existing feature attribution algorithms treat all features in a dataset homogeneously, which may lead to misinterpretation of consequences of changing feature values. In this work, we consider partitioning features into controllable and uncontrollable parts and propose the Controllable fActor Feature Attribution (CAFA) approach to compute the relative importance of controllable features. We carried out experiments applying CAFA to two existing datasets and our own COVID-19 non-pharmaceutical control measures dataset. Experimental results show that with CAFA, we are able to exclude influences from uncontrollable features in our explanation while keeping the full dataset for prediction.
LGJan 23
Provably Robust Bayesian Counterfactual Explanations under Model ChangesJamie Duell, Xiuyi Fan
Counterfactual explanations (CEs) offer interpretable insights into machine learning predictions by answering ``what if?" questions. However, in real-world settings where models are frequently updated, existing counterfactual explanations can quickly become invalid or unreliable. In this paper, we introduce Probabilistically Safe CEs (PSCE), a method for generating counterfactual explanations that are $δ$-safe, to ensure high predictive confidence, and $ε$-robust to ensure low predictive variance. Based on Bayesian principles, PSCE provides formal probabilistic guarantees for CEs under model changes which are adhered to in what we refer to as the $\langle δ, ε\rangle$-set. Uncertainty-aware constraints are integrated into our optimization framework and we validate our method empirically across diverse datasets. We compare our approach against state-of-the-art Bayesian CE methods, where PSCE produces counterfactual explanations that are not only more plausible and discriminative, but also provably robust under model change.
AISep 11, 2024
"My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student EssaysShengxin Hong, Chang Cai, Sixuan Du et al.
Interactive feedback, where feedback flows in both directions between teacher and student, is more effective than traditional one-way feedback. However, it is often too time-consuming for widespread use in educational practice. While Large Language Models (LLMs) have potential for automating feedback, they struggle with reasoning and interaction in an interactive setting. This paper introduces CAELF, a Contestable AI Empowered LLM Framework for automating interactive feedback. CAELF allows students to query, challenge, and clarify their feedback by integrating a multi-agent system with computational argumentation. Essays are first assessed by multiple Teaching-Assistant Agents (TA Agents), and then a Teacher Agent aggregates the evaluations through formal reasoning to generate feedback and grades. Students can further engage with the feedback to refine their understanding. A case study on 500 critical thinking essays with user studies demonstrates that CAELF significantly improves interactive feedback, enhancing the reasoning and interaction capabilities of LLMs. This approach offers a promising solution to overcoming the time and resource barriers that have limited the adoption of interactive feedback in educational settings.
LGJan 27
Robust Uncertainty Estimation under Distribution Shift via Difference ReconstructionXinran Xu, Li Rong Wang, Xiuyi Fan
Estimating uncertainty in deep learning models is critical for reliable decision-making in high-stakes applications such as medical imaging. Prior research has established that the difference between an input sample and its reconstructed version produced by an auxiliary model can serve as a useful proxy for uncertainty. However, directly comparing reconstructions with the original input is degraded by information loss and sensitivity to superficial details, which limits its effectiveness. In this work, we propose Difference Reconstruction Uncertainty Estimation (DRUE), a method that mitigates this limitation by reconstructing inputs from two intermediate layers and measuring the discrepancy between their outputs as the uncertainty score. To evaluate uncertainty estimation in practice, we follow the widely used out-of-distribution (OOD) detection paradigm, where in-distribution (ID) training data are compared against datasets with increasing domain shift. Using glaucoma detection as the ID task, we demonstrate that DRUE consistently achieves superior AUC and AUPR across multiple OOD datasets, highlighting its robustness and reliability under distribution shift. This work provides a principled and effective framework for enhancing model reliability in uncertain environments.
STApr 6, 2023
Stock Price Predictability and the Business Cycle via Machine LearningLi Rong Wang, Hsuan Fu, Xiuyi Fan
We study the impacts of business cycles on machine learning (ML) predictions. Using the S&P 500 index, we find that ML models perform worse during most recessions, and the inclusion of recession history or the risk-free rate does not necessarily improve their performance. Investigating recessions where models perform well, we find that they exhibit lower market volatility than other recessions. This implies that the improved performance is not due to the merit of ML methods but rather factors such as effective monetary policies that stabilized the market. We recommend that ML practitioners evaluate their models during both recessions and expansions.
LGFeb 27, 2024
QUCE: The Minimisation and Quantification of Path-Based Uncertainty for Generative Counterfactual ExplanationsJamie Duell, Monika Seisenberger, Hsuan Fu et al.
Deep Neural Networks (DNNs) stand out as one of the most prominent approaches within the Machine Learning (ML) domain. The efficacy of DNNs has surged alongside recent increases in computational capacity, allowing these approaches to scale to significant complexities for addressing predictive challenges in big data. However, as the complexity of DNN models rises, interpretability diminishes. In response to this challenge, explainable models such as Adversarial Gradient Integration (AGI) leverage path-based gradients provided by DNNs to elucidate their decisions. Yet the performance of path-based explainers can be compromised when gradients exhibit irregularities during out-of-distribution path traversal. In this context, we introduce Quantified Uncertainty Counterfactual Explanations (QUCE), a method designed to mitigate out-of-distribution traversal by minimizing path uncertainty. QUCE not only quantifies uncertainty when presenting explanations but also generates more certain counterfactual examples. We showcase the performance of the QUCE method by comparing it with competing methods for both path-based explanations and generative counterfactual examples.
CVDec 5, 2025
Explainable Melanoma Diagnosis with Contrastive Learning and LLM-based Report GenerationJunwen Zheng, Xinran Xu, Li Rong Wang et al.
Deep learning has demonstrated expert-level performance in melanoma classification, positioning it as a powerful tool in clinical dermatology. However, model opacity and the lack of interpretability remain critical barriers to clinical adoption, as clinicians often struggle to trust the decision-making processes of black-box models. To address this gap, we present a Cross-modal Explainable Framework for Melanoma (CEFM) that leverages contrastive learning as the core mechanism for achieving interpretability. Specifically, CEFM maps clinical criteria for melanoma diagnosis-namely Asymmetry, Border, and Color (ABC)-into the Vision Transformer embedding space using dual projection heads, thereby aligning clinical semantics with visual features. The aligned representations are subsequently translated into structured textual explanations via natural language generation, creating a transparent link between raw image data and clinical interpretation. Experiments on public datasets demonstrate 92.79% accuracy and an AUC of 0.961, along with significant improvements across multiple interpretability metrics. Qualitative analyses further show that the spatial arrangement of the learned embeddings aligns with clinicians' application of the ABC rule, effectively bridging the gap between high-performance classification and clinical trust.
AISep 14, 2025
Position Paper: Integrating Explainability and Uncertainty Estimation in Medical AIXiuyi Fan
Uncertainty is a fundamental challenge in medical practice, but current medical AI systems fail to explicitly quantify or communicate uncertainty in a way that aligns with clinical reasoning. Existing XAI works focus on interpreting model predictions but do not capture the confidence or reliability of these predictions. Conversely, uncertainty estimation (UE) techniques provide confidence measures but lack intuitive explanations. The disconnect between these two areas limits AI adoption in medicine. To address this gap, we propose Explainable Uncertainty Estimation (XUE) that integrates explainability with uncertainty quantification to enhance trust and usability in medical AI. We systematically map medical uncertainty to AI uncertainty concepts and identify key challenges in implementing XUE. We outline technical directions for advancing XUE, including multimodal uncertainty quantification, model-agnostic visualization techniques, and uncertainty-aware decision support systems. Lastly, we propose guiding principles to ensure effective XUE realisation. Our analysis highlights the need for AI systems that not only generate reliable predictions but also articulate confidence levels in a clinically meaningful way. This work contributes to the development of trustworthy medical AI by bridging explainability and uncertainty, paving the way for AI systems that are aligned with real-world clinical complexities.
LGSep 1, 2025
Towards Trustworthy Vital Sign Forecasting: Leveraging Uncertainty for Prediction IntervalsLi Rong Wang, Thomas C. Henderson, Yew Soon Ong et al.
Vital signs, such as heart rate and blood pressure, are critical indicators of patient health and are widely used in clinical monitoring and decision-making. While deep learning models have shown promise in forecasting these signals, their deployment in healthcare remains limited in part because clinicians must be able to trust and interpret model outputs. Without reliable uncertainty quantification -- particularly calibrated prediction intervals (PIs) -- it is unclear whether a forecasted abnormality constitutes a meaningful warning or merely reflects model noise, hindering clinical decision-making. To address this, we present two methods for deriving PIs from the Reconstruction Uncertainty Estimate (RUE), an uncertainty measure well-suited to vital-sign forecasting due to its sensitivity to data shifts and support for label-free calibration. Our parametric approach assumes that prediction errors and uncertainty estimates follow a Gaussian copula distribution, enabling closed-form PI computation. Our non-parametric approach, based on k-nearest neighbours (KNN), empirically estimates the conditional error distribution using similar validation instances. We evaluate these methods on two large public datasets with minute- and hour-level sampling, representing high- and low-frequency health signals. Experiments demonstrate that the Gaussian copula method consistently outperforms conformal prediction baselines on low-frequency data, while the KNN approach performs best on high-frequency data. These results underscore the clinical promise of RUE-derived PIs for delivering interpretable, uncertainty-aware vital sign forecasts.
LGAug 31, 2025
Causal SHAP: Feature Attribution with Dependency Awareness through Causal DiscoveryWoon Yee Ng, Li Rong Wang, Siyuan Liu et al.
Explaining machine learning (ML) predictions has become crucial as ML models are increasingly deployed in high-stakes domains such as healthcare. While SHapley Additive exPlanations (SHAP) is widely used for model interpretability, it fails to differentiate between causality and correlation, often misattributing feature importance when features are highly correlated. We propose Causal SHAP, a novel framework that integrates causal relationships into feature attribution while preserving many desirable properties of SHAP. By combining the Peter-Clark (PC) algorithm for causal discovery and the Intervention Calculus when the DAG is Absent (IDA) algorithm for causal strength quantification, our approach addresses the weakness of SHAP. Specifically, Causal SHAP reduces attribution scores for features that are merely correlated with the target, as validated through experiments on both synthetic and real-world datasets. This study contributes to the field of Explainable AI (XAI) by providing a practical framework for causal-aware model explanations. Our approach is particularly valuable in domains such as healthcare, where understanding true causal relationships is critical for informed decision-making.
CYMay 23, 2023
Embrace Opportunities and Face Challenges: Using ChatGPT in Undergraduate Students' Collaborative Interdisciplinary LearningGaoxia Zhu, Xiuyi Fan, Chenyu Hou et al.
ChatGPT, launched in November 2022, has gained widespread attention from students and educators globally, with an online report by Hu (2023) stating it as the fastest-growing consumer application in history. While discussions on the use of ChatGPT in higher education are abundant, empirical studies on its impact on collaborative interdisciplinary learning are rare. To investigate its potential, we conducted a quasi-experimental study with 130 undergraduate students (STEM and non-STEM) learning digital literacy with or without ChatGPT over two weeks. Weekly surveys were conducted on collaborative interdisciplinary problem-solving, physical and cognitive engagement, and individual reflections on ChatGPT use. Analysis of survey responses showed significant main effects of topics on collaborative interdisciplinary problem-solving and physical and cognitive engagement, a marginal interaction effect between disciplinary backgrounds and ChatGPT conditions for cognitive engagement, and a significant interaction effect for physical engagement. Sentiment analysis of student reflections suggested no significant difference between STEM and non-STEM students' opinions towards ChatGPT. Qualitative analysis of reflections generated eight positive themes, including efficiency, addressing knowledge gaps, and generating human-like responses, and eight negative themes, including generic responses, lack of innovation, and counterproductive to self-discipline and thinking. Our findings suggest that ChatGPT use needs to be optimized by considering the topics being taught and the disciplinary backgrounds of students rather than applying it uniformly. These findings have implications for both pedagogical research and practices.
AIJan 18, 2022
Explainable Decision Making with Lean and Argumentative ExplanationsXiuyi Fan, Francesca Toni
It is widely acknowledged that transparency of automated decision making is crucial for deployability of intelligent systems, and explaining the reasons why some decisions are "good" and some are not is a way to achieving this transparency. We consider two variants of decision making, where "good" decisions amount to alternatives (i) meeting "most" goals, and (ii) meeting "most preferred" goals. We then define, for each variant and notion of "goodness" (corresponding to a number of existing notions in the literature), explanations in two formats, for justifying the selection of an alternative to audiences with differing needs and competences: lean explanations, in terms of goals satisfied and, for some notions of "goodness", alternative decisions, and argumentative explanations, reflecting the decision process leading to the selection, while corresponding to the lean explanations. To define argumentative explanations, we use assumption-based argumentation (ABA), a well-known form of structured argumentation. Specifically, we define ABA frameworks such that "good" decisions are admissible ABA arguments and draw argumentative explanations from dispute trees sanctioning this admissibility. Finally, we instantiate our overall framework for explainable decision-making to accommodate connections between goals and decisions in terms of decision graphs incorporating defeasible and non-defeasible information.
AIDec 29, 2021
Towards a Shapley Value Graph Framework for Medical peer-influenceJamie Duell, Monika Seisenberger, Gert Aarts et al.
eXplainable Artificial Intelligence (XAI) is a sub-field of Artificial Intelligence (AI) that is at the forefront of AI research. In XAI, feature attribution methods produce explanations in the form of feature importance. People often use feature importance as guidance for intervention. However, a limitation of existing feature attribution methods is that there is a lack of explanation towards the consequence of intervention. In other words, although contribution towards a certain prediction is highlighted by feature attribution methods, the relation between features and the consequence of intervention is not studied. The aim of this paper is to introduce a new framework, called a peer influence framework to look deeper into explanations using graph representation for feature-to-feature interactions to improve the interpretability of black-box Machine Learning models and inform intervention.
AIMay 20, 2021
Evaluating the Correctness of Explainable AI Algorithms for ClassificationOrcun Yalcin, Xiuyi Fan, Siyuan Liu
Explainable AI has attracted much research attention in recent years with feature attribution algorithms, which compute "feature importance" in predictions, becoming increasingly popular. However, there is little analysis of the validity of these algorithms as there is no "ground truth" in the existing datasets to validate their correctness. In this work, we develop a method to quantitatively evaluate the correctness of XAI algorithms by creating datasets with known explanation ground truth. To this end, we focus on the binary classification problems. String datasets are constructed using formal language derived from a grammar. A string is positive if and only if a certain property is fulfilled. Symbols serving as explanation ground truth in a positive string are part of an explanation if and only if they contributes to fulfilling the property. Two popular feature attribution explainers, Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP), are used in our experiments.We show that: (1) classification accuracy is positively correlated with explanation accuracy; (2) SHAP provides more accurate explanations than LIME; (3) explanation accuracy is negatively correlated with dataset complexity.
CYMay 5, 2020
An Investigation of COVID-19 Spreading Factors with Explainable AI TechniquesXiuyi Fan, Siyuan Liu, Jiarong Chen et al.
Since COVID-19 was first identified in December 2019, various public health interventions have been implemented across the world. As different measures are implemented at different countries at different times, we conduct an assessment of the relative effectiveness of the measures implemented in 18 countries and regions using data from 22/01/2020 to 02/04/2020. We compute the top one and two measures that are most effective for the countries and regions studied during the period. Two Explainable AI techniques, SHAP and ECPI, are used in our study; such that we construct (machine learning) models for predicting the instantaneous reproduction number ($R_t$) and use the models as surrogates to the real world and inputs that the greatest influence to our models are seen as measures that are most effective. Across-the-board, city lockdown and contact tracing are the two most effective measures. For ensuring $R_t<1$, public wearing face masks is also important. Mass testing alone is not the most effective measure although when paired with other measures, it can be effective. Warm temperature helps for reducing the transmission.
AIMay 5, 2020
Explainable AI for Classification using Probabilistic Logic InferenceXiuyi Fan, Siyuan Liu, Thomas C. Henderson
The overarching goal of Explainable AI is to develop systems that not only exhibit intelligent behaviours, but also are able to explain their rationale and reveal insights. In explainable machine learning, methods that produce a high level of prediction accuracy as well as transparent explanations are valuable. In this work, we present an explainable classification method. Our method works by first constructing a symbolic Knowledge Base from the training data, and then performing probabilistic inferences on such Knowledge Base with linear programming. Our approach achieves a level of learning performance comparable to that of traditional classifiers such as random forests, support vector machines and neural networks. It identifies decisive features that are responsible for a classification as explanations and produces results similar to the ones found by SHAP, a state of the art Shapley Value based method. Our algorithms perform well on a range of synthetic and non-synthetic data sets.