99.0CYMar 11
Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research DirectionsSaleh Afroogh, Seyd Ishtiaque Ahmed, Petra Ahrweiler et al. · cmu
This study provides a cross-disciplinary examination of Explainable Artificial Intelligence (XAI) approaches-focusing on deep neural networks (DNNs) and large language models (LLMs)-and identifies empirical and conceptual limitations in current XAI. We discuss critical symptoms that stem from deeper root causes (i.e., two paradoxes, two conceptual confusions, and five false assumptions). These fundamental problems within the current XAI research field reveal three insights: experimentally, XAI exhibits significant flaws; conceptually, it is paradoxical; and pragmatically, further attempts to reform the paradoxical XAI might exacerbate its confusion-demanding fundamental shifts and new research directions. To move beyond XAI's limitations, we propose a four-pronged synthesized paradigm shift toward reliable and certified AI development. These four components include: verification-focused Interactive AI (IAI) to establish scientific community protocols for certifying AI system performance rather than attempting post-hoc explanations, AI Epistemology for rigorous scientific foundations, User-Sensible AI to create context-aware systems tailored to specific user communities, and Model-Centered Interpretability for faithful technical analysis-together offering comprehensive post-XAI research directions.
HCFeb 19, 2023
RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise ExpressionsYunlong Wang, Shuyuan Shen, Brian Y. Lim
Generative AI models have shown impressive ability to produce images with text prompts, which could benefit creativity in visual art creation and self-expression. However, it is unclear how precisely the generated images express contexts and emotions from the input texts. We explored the emotional expressiveness of AI-generated images and developed RePrompt, an automatic method to refine text prompts toward precise expression of the generated images. Inspired by crowdsourced editing strategies, we curated intuitive text features, such as the number and concreteness of nouns, and trained a proxy model to analyze the feature effects on the AI-generated image. With model explanations of the proxy model, we curated a rubric to adjust text prompts to optimize image generation for precise emotion expression. We conducted simulation and user studies, which showed that RePrompt significantly improves the emotional expressiveness of AI-generated images, especially for negative emotions.
AIMar 16, 2023
IRIS: Interpretable Rubric-Informed Segmentation for Action Quality AssessmentHitoshi Matsuyama, Nobuo Kawaguchi, Brian Y. Lim
AI-driven Action Quality Assessment (AQA) of sports videos can mimic Olympic judges to help score performances as a second opinion or for training. However, these AI methods are uninterpretable and do not justify their scores, which is important for algorithmic accountability. Indeed, to account for their decisions, instead of scoring subjectively, sports judges use a consistent set of criteria - rubric - on multiple actions in each performance sequence. Therefore, we propose IRIS to perform Interpretable Rubric-Informed Segmentation on action sequences for AQA. We investigated IRIS for scoring videos of figure skating performance. IRIS predicts (1) action segments, (2) technical element score differences of each segment relative to base scores, (3) multiple program component scores, and (4) the summed final score. In a modeling study, we found that IRIS performs better than non-interpretable, state-of-the-art models. In a formative user study, practicing figure skaters agreed with the rubric-informed explanations, found them useful, and trusted AI judgments more. This work highlights the importance of using judgment rubrics to account for AI decisions.
AIFeb 2, 2023
Diagrammatization and Abduction to Improve AI Interpretability With Domain-Aligned Explanations for Medical DiagnosisBrian Y. Lim, Joseph P. Cahaly, Chester Y. F. Sng et al.
Many visualizations have been developed for explainable AI (XAI), but they often require further reasoning by users to interpret. Investigating XAI for high-stakes medical diagnosis, we propose improving domain alignment with diagrammatic and abductive reasoning to reduce the interpretability gap. We developed DiagramNet to predict cardiac diagnoses from heart auscultation, select the best-fitting hypothesis based on criteria evaluation, and explain with clinically-relevant murmur diagrams. The ante-hoc interpretable model leverages domain-relevant ontology, representation, and reasoning process to increase trust in expert users. In modeling studies, we found that DiagramNet not only provides faithful murmur shape explanations, but also has better performance than baseline models. We demonstrate the interpretability and trustworthiness of diagrammatic, abductive explanations in a qualitative user study with medical students, showing that clinically-relevant, diagrammatic explanations are preferred over technical saliency map explanations. This work contributes insights into providing domain-aligned explanations for user-centric XAI in complex domains.
99.1HCMar 12
From Control to Foresight: Simulation as a New Paradigm for Human-Agent CollaborationGaole He, Brian Y. Lim
Large Language Models (LLMs) are increasingly used to power autonomous agents for complex, multi-step tasks. However, human-agent interaction remains pointwise and reactive: users approve or correct individual actions to mitigate immediate risks, without visibility into subsequent consequences. This forces users to mentally simulate long-term effects, a cognitively demanding and often inaccurate process. Users have control over individual steps but lack the foresight to make informed decisions. We argue that effective collaboration requires foresight, not just control. We propose simulation-in-the-loop, an interaction paradigm that enables users and agents to explore simulated future trajectories before committing to decisions. Simulation transforms intervention from reactive guesswork into informed exploration, while helping users discover latent constraints and preferences along the way. This perspective paper characterizes the limitations of current paradigms, introduces a conceptual framework for simulation-based collaboration, and illustrates its potential through concrete human-agent collaboration scenarios.
HCMar 6
Beyond Scores: Explainable Intelligent Assessment Strengthens Pre-service Teachers' Assessment LiteracyYuang Wei, Fei Wang, Yifan Zhang et al.
Assessment literacy (AL) is essential for personalized education, yet difficult to cultivate in pre-service teachers. Conventional teacher preparation programs focus on theoretical knowledge, while digital assessment tools commonly provide opaque scores or parameters. These limitations hinder reflection and transfer, leaving AL underdeveloped. We propose XIA, an eXplainable Intelligent Assessment platform that extends statistics-informed support with visualized cognitive diagnostic reasoning, including contrastive and counterfactual explanations. In a pre-post controlled study with 21 pre-service teachers, we combined quantitative tasks and questionnaires with qualitative interviews. The findings offer preliminary evidence that XIA supported reflection, self-regulation, and assessment awareness, and helped reduce assessment errors. Interviews further showed a shift from score-based judgments toward evidence-based reasoning. This work contributes insights into the design of intelligent assessment tools, showing how explanatory scaffolding can bridge assessment theory and classroom practice and support the cultivation of AL in teacher education.
9.0AIApr 30
CoAX: Cognitive-Oriented Attribution eXplanation User Model of Human Understanding of AI ExplanationsLouth Bin Rawshan, Zhuoyu Wang, Brian Y. Lim
Explainable AI (XAI) aims to improve user understanding and decisions when using AI models. However, despite innovations in XAI, recent user evaluations reveal that this goal remains elusive. Understanding human cognition can help explain why users struggle to effectively use AI explanations. Focusing on reasoning on structured (tabular) data, we examined various reasoning strategies for different XAI methods (none, feature importance, feature attribution) in the decision task of anticipating AI decisions (i.e., forward simulation). We i) elicited reasoning strategies from a formative user study, and ii) collected decisions from a summative user study. Using cognitive modeling, we implemented the processes underlying each reasoning strategy and evaluated their alignment with human decision-making. We found that our models better fit human decisions than baseline machine learning proxies, providing insights into which reasoning strategies are (in)effective. We then demonstrate how the fitted model can be used to form hypotheses and investigate research questions that are costly to study with real human participants. This work contributes to debugging human understanding of XAI, informing the future development of more usable and interpretable AI explanations.
HCApr 10, 2024
Incremental XAI: Memorable Understanding of AI with Incremental ExplanationsJessica Y. Bo, Pan Hao, Brian Y. Lim
Many explainable AI (XAI) techniques strive for interpretability by providing concise salient information, such as sparse linear factors. However, users either only see inaccurate global explanations, or highly-varying local explanations. We propose to provide more detailed explanations by leveraging the human cognitive capacity to accumulate knowledge by incrementally receiving more details. Focusing on linear factor explanations (factors $\times$ values = outcome), we introduce Incremental XAI to automatically partition explanations for general and atypical instances by providing Base + Incremental factors to help users read and remember more faithful explanations. Memorability is improved by reusing base factors and reducing the number of factors shown in atypical cases. In modeling, formative, and summative user studies, we evaluated the faithfulness, memorability and understandability of Incremental XAI against baseline explanation methods. This work contributes towards more usable explanation that users can better ingrain to facilitate intuitive engagement with AI.
HCFeb 15, 2022
IF-City: Intelligible Fair City Planning to Measure, Explain and Mitigate InequalityYan Lyu, Hangxin Lu, Min Kyung Lee et al.
With the increasing pervasiveness of Artificial Intelligence (AI), many visual analytics tools have been proposed to examine fairness, but they mostly focus on data scientist users. Instead, tackling fairness must be inclusive and involve domain experts with specialized tools and workflows. Thus, domain-specific visualizations are needed for algorithmic fairness. Furthermore, while much work on AI fairness has focused on predictive decisions, less has been done for fair allocation and planning, which require human expertise and iterative design to integrate myriad constraints. We propose the Intelligible Fair Allocation (IF-Alloc) Framework that leverages explanations of causal attribution (Why), contrastive (Why Not) and counterfactual reasoning (What If, How To) to aid domain experts to assess and alleviate unfairness in allocation problems. We apply the framework to fair urban planning for designing cities that provide equal access to amenities and benefits for diverse resident types. Specifically, we propose an interactive visual tool, Intelligible Fair City Planner (IF-City), to help urban planners to perceive inequality across groups, identify and attribute sources of inequality, and mitigate inequality with automatic allocation simulations and constraint-satisfying recommendations. We demonstrate and evaluate the usage and usefulness of IF-City on a real neighborhood in New York City, US, with practicing urban planners from multiple countries, and discuss generalizing our findings, application, and framework to other use cases and applications of fair allocation.
HCJan 30, 2022
Debiased-CAM to mitigate systematic error with faithful visual explanations of machine learningWencan Zhang, Mariella Dimiccoli, Brian Y. Lim
Model explanations such as saliency maps can improve user trust in AI by highlighting important features for a prediction. However, these become distorted and misleading when explaining predictions of images that are subject to systematic error (bias). Furthermore, the distortions persist despite model fine-tuning on images biased by different factors (blur, color temperature, day/night). We present Debiased-CAM to recover explanation faithfulness across various bias types and levels by training a multi-input, multi-task model with auxiliary tasks for explanation and bias level predictions. In simulation studies, the approach not only enhanced prediction accuracy, but also generated highly faithful explanations about these predictions as if the images were unbiased. In user studies, debiased explanations improved user task performance, perceived truthfulness and perceived helpfulness. Debiased training can provide a versatile platform for robust performance and explanation faithfulness for a wide range of applications with data biases.
HCDec 28, 2021
Towards Relatable Explainable AI with the Perceptual ProcessWencan Zhang, Brian Y. Lim
Machine learning models need to provide contrastive explanations, since people often seek to understand why a puzzling prediction occurred instead of some expected outcome. Current contrastive explanations are rudimentary comparisons between examples or raw features, which remain difficult to interpret, since they lack semantic meaning. We argue that explanations must be more relatable to other concepts, hypotheticals, and associations. Inspired by the perceptual process from cognitive psychology, we propose the XAI Perceptual Processing Framework and RexNet model for relatable explainable AI with Contrastive Saliency, Counterfactual Synthetic, and Contrastive Cues explanations. We investigated the application of vocal emotion recognition, and implemented a modular multi-task deep neural network to predict and explain emotions from speech. From think-aloud and controlled studies, we found that counterfactual explanations were useful and further enhanced with semantic cues, but not saliency explanations. This work provides insights into providing and evaluating relatable contrastive explainable AI for perception applications.
HCSep 21, 2021
SalienTrack: providing salient information for semi-automated self-tracking feedback with model explanationsYunlong Wang, Jiaying Liu, Homin Park et al.
Self-tracking can improve people's awareness of their unhealthy behaviors and support reflection to inform behavior change. Increasingly, new technologies make tracking easier, leading to large amounts of tracked data. However, much of that information is not salient for reflection and self-awareness. To tackle this burden for reflection, we created the SalienTrack framework, which aims to 1) identify salient tracking events, 2) select the salient details of those events, 3) explain why they are informative, and 4) present the details as manually elicited or automatically shown feedback. We implemented SalienTrack in the context of nutrition tracking. To do this, we first conducted a field study to collect photo-based mobile food tracking over 1-5 weeks. We then report how we used this data to train an explainable-AI model of salience. Finally, we created interfaces to present salient information and conducted a formative user study to gain insights about how SalienTrack could be integrated into an interface for reflection. Our key contributions are the SalienTrack framework, a demonstration of its implementation for semi-automated feedback in an important and challenging self-tracking context and a discussion of the broader uses of the framework.
HCSep 21, 2021
Interpretable Directed Diversity: Leveraging Model Explanations for Iterative Crowd IdeationYunlong Wang, Priyadarshini Venkatesh, Brian Y. Lim
Feedback in creativity support tools can help crowdworkers to improve their ideations. However, current feedback methods require human assessment from facilitators or peers. This is not scalable to large crowds. We propose Interpretable Directed Diversity to automatically predict ideation quality and diversity scores, and provide AI explanations - Attribution, Contrastive Attribution, and Counterfactual Suggestions - to feedback on why ideations were scored (low), and how to get higher scores. These explanations provide multi-faceted feedback as users iteratively improve their ideations. We conducted formative and controlled user studies to understand the usage and usefulness of explanations to improve ideation diversity and quality. Users appreciated that explanation feedback helped focus their efforts and provided directions for improvement. This resulted in explanations improving diversity compared to no feedback or feedback with scores only. Hence, our approach opens opportunities for explainable AI towards scalable and rich feedback for iterative crowd ideation and creativity support tools.
CVApr 26, 2021
Exploiting Explanations for Model Inversion AttacksXuejun Zhao, Wencan Zhang, Xiaokui Xiao et al.
The successful deployment of artificial intelligence (AI) in many domains from healthcare to hiring requires their responsible use, particularly in model explanations and privacy. Explainable artificial intelligence (XAI) provides more information to help users to understand model decisions, yet this additional knowledge exposes additional risks for privacy attacks. Hence, providing explanation harms privacy. We study this risk for image-based model inversion attacks and identified several attack architectures with increasing performance to reconstruct private image data from model explanations. We have developed several multi-modal transposed CNN architectures that achieve significantly higher inversion performance than using the target model prediction only. These XAI-aware inversion models were designed to exploit the spatial knowledge in image explanations. To understand which explanations have higher privacy risk, we analyzed how various explanation types and factors influence inversion performance. In spite of some models not providing explanations, we further demonstrate increased inversion performance even for non-explainable target models by exploiting explanations of surrogate models through attention transfer. This method first inverts an explanation from the target prediction, then reconstructs the target image. These threats highlight the urgent and significant privacy risks of explanations and calls attention for new privacy preservation techniques that balance the dual-requirement for AI explainability and privacy.
LGJan 23, 2021
Show or Suppress? Managing Input Uncertainty in Machine Learning Model ExplanationsDanding Wang, Wencan Zhang, Brian Y. Lim
Feature attribution is widely used in interpretable machine learning to explain how influential each measured input feature value is for an output inference. However, measurements can be uncertain, and it is unclear how the awareness of input uncertainty can affect the trust in explanations. We propose and study two approaches to help users to manage their perception of uncertainty in a model explanation: 1) transparently show uncertainty in feature attributions to allow users to reflect on, and 2) suppress attribution to features with uncertain measurements and shift attribution to other features by regularizing with an uncertainty penalty. Through simulation experiments, qualitative interviews, and quantitative user evaluations, we identified the benefits of moderately suppressing attribution uncertainty, and concerns regarding showing attribution uncertainty. This work adds to the understanding of handling and communicating uncertainty for model interpretability.
HCJan 15, 2021
Directed Diversity: Leveraging Language Embedding Distances for Collective Creativity in Crowd IdeationSamuel Rhys Cox, Yunlong Wang, Ashraf Abdul et al.
Crowdsourcing can collect many diverse ideas by prompting ideators individually, but this can generate redundant ideas. Prior methods reduce redundancy by presenting peers' ideas or peer-proposed prompts, but these require much human coordination. We introduce Directed Diversity, an automatic prompt selection approach that leverages language model embedding distances to maximize diversity. Ideators can be directed towards diverse prompts and away from prior ideas, thus improving their collective creativity. Since there are diverse metrics of diversity, we present a Diversity Prompting Evaluation Framework consolidating metrics from several research disciplines to analyze along the ideation chain - prompt selection, prompt creativity, prompt-ideation mediation, and ideation creativity. Using this framework, we evaluated Directed Diversity in a series of a simulation study and four user studies for the use case of crowdsourcing motivational messages to encourage physical activity. We show that automated diverse prompting can variously improve collective creativity across many nuanced metrics of diversity.
CVDec 10, 2020
Debiased-CAM to mitigate image perturbations with faithful visual explanations of machine learningWencan Zhang, Mariella Dimiccoli, Brian Y. Lim
Model explanations such as saliency maps can improve user trust in AI by highlighting important features for a prediction. However, these become distorted and misleading when explaining predictions of images that are subject to systematic error (bias) by perturbations and corruptions. Furthermore, the distortions persist despite model fine-tuning on images biased by different factors (blur, color temperature, day/night). We present Debiased-CAM to recover explanation faithfulness across various bias types and levels by training a multi-input, multi-task model with auxiliary tasks for explanation and bias level predictions. In simulation studies, the approach not only enhanced prediction accuracy, but also generated highly faithful explanations about these predictions as if the images were unbiased. In user studies, debiased explanations improved user task performance, perceived truthfulness and perceived helpfulness. Debiased training can provide a versatile platform for robust performance and explanation faithfulness for a wide range of applications with data biases.
HCNov 21, 2019
NaMemo: Enhancing Lecturers' Interpersonal Competence of Remembering Students' NamesGuang Jiang, Mengzhen Shi, Ying Su et al.
Addressing students by their names helps a teacher to start building rapport with students and thus facilitates their classroom participation. However, this basic yet effective skill has become rather challenging for university lecturers, who have to handle large-sized (sometimes exceeding 100) groups in their daily teaching. To enhance lecturers' competence in delivering interpersonal interaction, we developed NaMemo, a real-time name-indicating system based on a dedicated face-recognition pipeline. This paper presents the system design, the pilot feasibility test, and our plan for the following study, which aims to evaluate NaMemo's impacts on learning and teaching, as well as to probe design implications including privacy considerations.