AISep 29, 2023
Tell Me a Story! Narrative-Driven XAI with Large Language ModelsDavid Martens, James Hinns, Camille Dams et al.
In many AI applications today, the predominance of black-box machine learning models, due to their typically higher accuracy, amplifies the need for Explainable AI (XAI). Existing XAI approaches, such as the widely used SHAP values or counterfactual (CF) explanations, are arguably often too technical for users to understand and act upon. To enhance comprehension of explanations of AI decisions and the overall user experience, we introduce XAIstories, which leverage Large Language Models to provide narratives about how AI predictions are made: SHAPstories do so based on SHAP explanations, while CFstories do so for CF explanations. We study the impact of our approach on users' experience and understanding of AI predictions. Our results are striking: over 90% of the surveyed general audience finds the narratives generated by SHAPstories convincing. Data scientists primarily see the value of SHAPstories in communicating explanations to a general audience, with 83% of data scientists indicating they are likely to use SHAPstories for this purpose. In an image classification setting, CFstories are considered more or equally convincing as the users' own crafted stories by more than 75% of the participants. CFstories additionally bring a tenfold speed gain in creating a narrative. We also find that SHAPstories help users to more accurately summarize and understand AI decisions, in a credit scoring setting we test, correctly answering comprehension questions significantly more often than they do when only SHAP values are provided. The results thereby suggest that XAIstories may significantly help explaining and understanding AI predictions, ultimately supporting better decision-making in various applications.
LGJan 8
On the Definition and Detection of Cherry-Picking in Counterfactual ExplanationsJames Hinns, Sofie Goethals, Stephan Van der Veeken et al.
Counterfactual explanations are widely used to communicate how inputs must change for a model to alter its prediction. For a single instance, many valid counterfactuals can exist, which leaves open the possibility for an explanation provider to cherry-pick explanations that better suit a narrative of their choice, highlighting favourable behaviour and withholding examples that reveal problematic behaviour. We formally define cherry-picking for counterfactual explanations in terms of an admissible explanation space, specified by the generation procedure, and a utility function. We then study to what extent an external auditor can detect such manipulation. Considering three levels of access to the explanation process: full procedural access, partial procedural access, and explanation-only access, we show that detection is extremely limited in practice. Even with full procedural access, cherry-picked explanations can remain difficult to distinguish from non cherry-picked explanations, because the multiplicity of valid counterfactuals and flexibility in the explanation specification provide sufficient degrees of freedom to mask deliberate selection. Empirically, we demonstrate that this variability often exceeds the effect of cherry-picking on standard counterfactual quality metrics such as proximity, plausibility, and sparsity, making cherry-picked explanations statistically indistinguishable from baseline explanations. We argue that safeguards should therefore prioritise reproducibility, standardisation, and procedural constraints over post-hoc detection, and we provide recommendations for algorithm developers, explanation providers, and auditors.
CLDec 13, 2024
How good is my story? Towards quantitative metrics for evaluating LLM-generated XAI narrativesTimour Ichmoukhamedov, James Hinns, David Martens
A rapidly developing application of LLMs in XAI is to convert quantitative explanations such as SHAP into user-friendly narratives to explain the decisions made by smaller prediction models. Evaluating the narratives without relying on human preference studies or surveys is becoming increasingly important in this field. In this work we propose a framework and explore several automated metrics to evaluate LLM-generated narratives for explanations of tabular classification tasks. We apply our approach to compare several state-of-the-art LLMs across different datasets and prompt types. As a demonstration of their utility, these metrics allow us to identify new challenges related to LLM hallucinations for XAI narratives.
CVMay 24, 2024
Exposing Image Classifier Shortcuts with Counterfactual Frequency (CoF) TablesJames Hinns, David Martens
The rise of deep learning in image classification has brought unprecedented accuracy but also highlighted a key issue: the use of 'shortcuts' by models. Such shortcuts are easy-to-learn patterns from the training data that fail to generalise to new data. Examples include the use of a copyright watermark to recognise horses, snowy background to recognise huskies, or ink markings to detect malignant skin lesions. The explainable AI (XAI) community has suggested using instance-level explanations to detect shortcuts without external data, but this requires the examination of many explanations to confirm the presence of such shortcuts, making it a labour-intensive process. To address these challenges, we introduce Counterfactual Frequency (CoF) tables, a novel approach that aggregates instance-based explanations into global insights, and exposes shortcuts. The aggregation implies the need for some semantic concepts to be used in the explanations, which we solve by labelling the segments of an image. We demonstrate the utility of CoF tables across several datasets, revealing the shortcuts learned from them.
CVJun 29, 2025
Aggregating Local Saliency Maps for Semi-Global Explainable Image ClassificationJames Hinns, David Martens
Deep learning dominates image classification tasks, yet understanding how models arrive at predictions remains a challenge. Much research focuses on local explanations of individual predictions, such as saliency maps, which visualise the influence of specific pixels on a model's prediction. However, reviewing many of these explanations to identify recurring patterns is infeasible, while global methods often oversimplify and miss important local behaviours. To address this, we propose Segment Attribution Tables (SATs), a method for summarising local saliency explanations into (semi-)global insights. SATs take image segments (such as "eyes" in Chihuahuas) and leverage saliency maps to quantify their influence. These segments highlight concepts the model relies on across instances and reveal spurious correlations, such as reliance on backgrounds or watermarks, even when out-of-distribution test performance sees little change. SATs can explain any classifier for which a form of saliency map can be produced, using segmentation maps that provide named segments. SATs bridge the gap between oversimplified global summaries and overly detailed local explanations, offering a practical tool for analysing and debugging image classifiers.
CLJun 20, 2025
Cash or Comfort? How LLMs Value Your InconvenienceMateusz Cedro, Timour Ichmoukhamedov, Sofie Goethals et al.
Large Language Models (LLMs) are increasingly proposed as near-autonomous artificial intelligence (AI) agents capable of making everyday decisions on behalf of humans. Although LLMs perform well on many technical tasks, their behaviour in personal decision-making remains less understood. Previous studies have assessed their rationality and moral alignment with human decisions. However, the behaviour of AI assistants in scenarios where financial rewards are at odds with user comfort has not yet been thoroughly explored. In this paper, we tackle this problem by quantifying the prices assigned by multiple LLMs to a series of user discomforts: additional walking, waiting, hunger and pain. We uncover several key concerns that strongly question the prospect of using current LLMs as decision-making assistants: (1) a large variance in responses between LLMs, (2) within a single LLM, responses show fragility to minor variations in prompt phrasing (e.g., reformulating the question in the first person can considerably alter the decision), (3) LLMs can accept unreasonably low rewards for major inconveniences (e.g., 1 Euro to wait 10 hours), and (4) LLMs can reject monetary gains where no discomfort is imposed (e.g., 1,000 Euro to wait 0 minutes). These findings emphasize the need for scrutiny of how LLMs value human inconvenience, particularly as we move toward applications where such cash-versus-comfort trade-offs are made on users' behalf.