CVMar 12, 2022
Evaluating Explainable AI on a Multi-Modal Medical Imaging Task: Can Existing Algorithms Fulfill Clinical Requirements?Weina Jin, Xiaoxiao Li, Ghassan Hamarneh
Being able to explain the prediction to clinical end-users is a necessity to leverage the power of artificial intelligence (AI) models for clinical decision support. For medical images, a feature attribution map, or heatmap, is the most common form of explanation that highlights important features for AI models' prediction. However, it is unknown how well heatmaps perform on explaining decisions on multi-modal medical images, where each image modality or channel visualizes distinct clinical information of the same underlying biomedical phenomenon. Understanding such modality-dependent features is essential for clinical users' interpretation of AI decisions. To tackle this clinically important but technically ignored problem, we propose the modality-specific feature importance (MSFI) metric. It encodes clinical image and explanation interpretation patterns of modality prioritization and modality-specific feature localization. We conduct a clinical requirement-grounded, systematic evaluation using computational methods and a clinician user study. Results show that the examined 16 heatmap algorithms failed to fulfill clinical requirements to correctly indicate AI model decision process or decision quality. The evaluation and MSFI metric can guide the design and selection of XAI algorithms to meet clinical requirements on multi-modal explanation.
HCFeb 10, 2023
Invisible Users: Uncovering End-Users' Requirements for Explainable AI via Explanation Forms and GoalsWeina Jin, Jianyu Fan, Diane Gromala et al.
Non-technical end-users are silent and invisible users of the state-of-the-art explainable artificial intelligence (XAI) technologies. Their demands and requirements for AI explainability are not incorporated into the design and evaluation of XAI techniques, which are developed to explain the rationales of AI decisions to end-users and assist their critical decisions. This makes XAI techniques ineffective or even harmful in high-stakes applications, such as healthcare, criminal justice, finance, and autonomous driving systems. To systematically understand end-users' requirements to support the technical development of XAI, we conducted the EUCA user study with 32 layperson participants in four AI-assisted critical tasks. The study identified comprehensive user requirements for feature-, example-, and rule-based XAI techniques (manifested by the end-user-friendly explanation forms) and XAI evaluation objectives (manifested by the explanation goals), which were shown to be helpful to directly inspire the proposal of new XAI algorithms and evaluation metrics. The EUCA study findings, the identified explanation forms and goals for technical specification, and the EUCA study dataset support the design and evaluation of end-user-centered XAI techniques for accessible, safe, and accountable AI.
AIAug 18, 2022
Transcending XAI Algorithm Boundaries through End-User-Inspired DesignWeina Jin, Jianyu Fan, Diane Gromala et al.
The boundaries of existing explainable artificial intelligence (XAI) algorithms are confined to problems grounded in technical users' demand for explainability. This research paradigm disproportionately ignores the larger group of non-technical end users, who have a much higher demand for AI explanations in diverse explanation goals, such as making safer and better decisions and improving users' predicted outcomes. Lacking explainability-focused functional support for end users may hinder the safe and accountable use of AI in high-stakes domains, such as healthcare, criminal justice, finance, and autonomous driving systems. Built upon prior human factor analysis on end users' requirements for XAI, we identify and model four novel XAI technical problems covering the full spectrum from design to the evaluation of XAI algorithms, including edge-case-based reasoning, customizable counterfactual explanation, collapsible decision tree, and the verifiability metric to evaluate XAI utility. Based on these newly-identified research problems, we also discuss open problems in the technical development of user-centered XAI to inspire future research. Our work bridges human-centered XAI with the technical XAI community, and calls for a new research paradigm on the technical development of user-centered XAI for the responsible use of AI in critical tasks.
AIMar 30, 2023
Why is plausibility surprisingly problematic as an XAI criterion?Weina Jin, Xiaoxiao Li, Ghassan Hamarneh
Explainable artificial intelligence (XAI) is motivated by the problem of making AI predictions understandable, transparent, and responsible, as AI becomes increasingly impactful in society and high-stakes domains. The evaluation and optimization criteria of XAI are gatekeepers for XAI algorithms to achieve their expected goals and should withstand rigorous inspection. To improve the scientific rigor of XAI, we conduct a critical examination of a common XAI criterion: plausibility. Plausibility assesses how convincing the AI explanation is to humans, and is usually quantified by metrics of feature localization or feature correlation. Our examination shows that plausibility is invalid to measure explainability, and human explanations are not the ground truth for XAI, because doing so ignores the necessary assumptions underpinning an explanation. Our examination further reveals the consequences of using plausibility as an XAI criterion, including increasing misleading explanations that manipulate users, deteriorating users' trust in the AI system, undermining human autonomy, being unable to achieve complementary human-AI task performance, and abandoning other possible approaches of enhancing understandability. Due to the invalidity of measurements and the unethical issues, this position paper argues that the community should stop using plausibility as a criterion for the evaluation and optimization of XAI algorithms. We also delineate new research approaches to improve XAI in trustworthiness, understandability, and utility to users, including complementary human-AI task performance.
CYMar 10, 2025
AI for Just Work: Constructing Diverse Imaginations of AI beyond "Replacing Humans"Weina Jin, Nicholas Vincent, Ghassan Hamarneh
"why" we develop AI. Lacking critical reflections on the general visions and purposes of AI may make the community vulnerable to manipulation. In this position paper, we explore the "why" question of AI. We denote answers to the "why" question the imaginations of AI, which depict our general visions, frames, and mindsets for the prospects of AI. We identify that the prevailing vision in the AI community is largely a monoculture that emphasizes objectives such as replacing humans and improving productivity. Our critical examination of this mainstream imagination highlights its underpinning and potentially unjust assumptions. We then call to diversify our collective imaginations of AI, embedding ethical assumptions from the outset in the imaginations of AI. To facilitate the community's pursuit of diverse imaginations, we demonstrate one process for constructing a new imagination of "AI for just work," and showcase its application in the medical image synthesis task to make it more ethical. We hope this work will help the AI community to open critical dialogues with civil society on the visions and purposes of AI, and inspire more technical works and advocacy in pursuit of diverse and ethical imaginations to restore the value of AI for the public good.
CYAug 12, 2025
Ethical Medical Image SynthesisWeina Jin, Ashish Sinha, Kumar Abhishek et al.
The task of ethical Medical Image Synthesis (MISyn) is to ensure that the MISyn techniques are researched and developed ethically throughout their entire lifecycle, which is essential to prevent the negative impacts of MISyn. To address the ever-increasing needs and requirements for ethical practice of MISyn research and development, we first conduct a theoretical analysis that identifies the key properties of ethical MISyn and intrinsic limits of MISyn. We identify that synthetic images lack inherent grounding in real medical phenomena, cannot fully represent the training medical images, and inevitably introduce new distribution shifts and biases. Ethical risks can arise from not acknowledging the intrinsic limits and weaknesses of synthetic images compared to medical images, with the extreme form manifested as misinformation of MISyn that substitutes synthetic images for medical images without acknowledgment. The resulting ethical harms include eroding trust in the medical imaging dataset environment and causing algorithmic discrimination towards stakeholders and the public. To facilitate collective efforts towards ethical MISyn within and outside the medical image analysis community, we then propose practical supports for ethical practice in MISyn based on the theoretical analysis, including ethical practice recommendations that adapt the existing technical standards, problem formulation, design, and evaluation practice of MISyn to the ethical challenges; and oversight recommendations to facilitate checks and balances from stakeholders and the public. We also present two case studies that demonstrate how to apply the ethical practice recommendations in practice, and identify gaps between existing practice and the ethical practice recommendations.
LGFeb 16, 2022
Guidelines and Evaluation of Clinical Explainable AI in Medical Image AnalysisWeina Jin, Xiaoxiao Li, Mostafa Fatehi et al.
Explainable artificial intelligence (XAI) is essential for enabling clinical users to get informed decision support from AI and comply with evidence-based medical practice. Applying XAI in clinical settings requires proper evaluation criteria to ensure the explanation technique is both technically sound and clinically useful, but specific support is lacking to achieve this goal. To bridge the research gap, we propose the Clinical XAI Guidelines that consist of five criteria a clinical XAI needs to be optimized for. The guidelines recommend choosing an explanation form based on Guideline 1 (G1) Understandability and G2 Clinical relevance. For the chosen explanation form, its specific XAI technique should be optimized for G3 Truthfulness, G4 Informative plausibility, and G5 Computational efficiency. Following the guidelines, we conducted a systematic evaluation on a novel problem of multi-modal medical image explanation with two clinical tasks, and proposed new evaluation metrics accordingly. Sixteen commonly-used heatmap XAI techniques were evaluated and found to be insufficient for clinical use due to their failure in G3 and G4. Our evaluation demonstrated the use of Clinical XAI Guidelines to support the design and evaluation of clinically viable XAI.
CVJul 11, 2021
One Map Does Not Fit All: Evaluating Saliency Map Explanation on Multi-Modal Medical ImagesWeina Jin, Xiaoxiao Li, Ghassan Hamarneh
Being able to explain the prediction to clinical end-users is a necessity to leverage the power of AI models for clinical decision support. For medical images, saliency maps are the most common form of explanation. The maps highlight important features for AI model's prediction. Although many saliency map methods have been proposed, it is unknown how well they perform on explaining decisions on multi-modal medical images, where each modality/channel carries distinct clinical meanings of the same underlying biomedical phenomenon. Understanding such modality-dependent features is essential for clinical users' interpretation of AI decisions. To tackle this clinically important but technically ignored problem, we propose the MSFI (Modality-Specific Feature Importance) metric to examine whether saliency maps can highlight modality-specific important features. MSFI encodes the clinical requirements on modality prioritization and modality-specific feature localization. Our evaluations on 16 commonly used saliency map methods, including a clinician user study, show that although most saliency map methods captured modality importance information in general, most of them failed to highlight modality-specific important features consistently and precisely. The evaluation results guide the choices of saliency map methods and provide insights to propose new ones targeting clinical applications.
HCFeb 4, 2021
EUCA: the End-User-Centered Explainable AI FrameworkWeina Jin, Jianyu Fan, Diane Gromala et al.
The ability to explain decisions to end-users is a necessity to deploy AI as critical decision support. Yet making AI explainable to non-technical end-users is a relatively ignored and challenging problem. To bridge the gap, we first identify twelve end-user-friendly explanatory forms that do not require technical knowledge to comprehend, including feature-, example-, and rule-based explanations. We then instantiate the explanatory forms as prototyping cards in four AI-assisted critical decision-making tasks, and conduct a user study to co-design low-fidelity prototypes with 32 layperson participants. The results confirm the relevance of using explanatory forms as building blocks of explanations, and identify their proprieties - pros, cons, applicable explanation goals, and design implications. The explanatory forms, their proprieties, and prototyping supports (including a suggested prototyping process, design templates and exemplars, and associated algorithms to actualize explanatory forms) constitute the End-User-Centered explainable AI framework EUCA, and is available at http://weinajin.github.io/end-user-xai . It serves as a practical prototyping toolkit for HCI/AI practitioners and researchers to understand user requirements and build end-user-centered explainable AI.
IVNov 28, 2019
Artificial Intelligence in Glioma Imaging: Challenges and AdvancesWeina Jin, Mostafa Fatehi, Kumar Abhishek et al.
Primary brain tumors including gliomas continue to pose significant management challenges to clinicians. While the presentation, the pathology, and the clinical course of these lesions are variable, the initial investigations are usually similar. Patients who are suspected to have a brain tumor will be assessed with computed tomography (CT) and magnetic resonance imaging (MRI). The imaging findings are used by neurosurgeons to determine the feasibility of surgical resection and plan such an undertaking. Imaging studies are also an indispensable tool in tracking tumor progression or its response to treatment. As these imaging studies are non-invasive, relatively cheap and accessible to patients, there have been many efforts over the past two decades to increase the amount of clinically-relevant information that can be extracted from brain imaging. Most recently, artificial intelligence (AI) techniques have been employed to segment and characterize brain tumors, as well as to detect progression or treatment-response. However, the clinical utility of such endeavours remains limited due to challenges in data collection and annotation, model training, and the reliability of AI-generated information. We provide a review of recent advances in addressing the above challenges. First, to overcome the challenge of data paucity, different image imputation and synthesis techniques along with annotation collection efforts are summarized. Next, various training strategies are presented to meet multiple desiderata, such as model performance, generalization ability, data privacy protection, and learning with sparse annotations. Finally, standardized performance evaluation and model interpretability methods have been reviewed. We believe that these technical approaches will facilitate the development of a fully-functional AI tool in the clinical care of patients with gliomas.
HCApr 7, 2019
Ride N' Rhythm, Bike as an Embodied Musical Instrument to Improve Music Perception for Young ChildrenWeina Jin, Alissa N. Antle, Diane Gromala
Music plays a crucial role in young children's development. Current research lacks the design of an interactive system for younger children that could generate dynamic music change in response to the children's body movement. In this paper, we present the design of bike as an embodied musical instrument for young children 2-5 years old to improve their music perception skills. In the Ride N' Rhythm prototype, the rider's body position maps to the music volume; and the speed of the bike maps to the tempo. The design of the prototype incorporates the Embodied Music Cognition theory and Dalcroze Eurhythmics pedagogy, and aims to internalize the 'intuitive' knowing and musical understanding via the combination of music and body movement.