LGJan 26, 2023
Finding Regions of Counterfactual Explanations via Robust OptimizationDonato Maragno, Jannis Kurtz, Tabea E. Röber et al.
Counterfactual explanations play an important role in detecting bias and improving the explainability of data-driven classification models. A counterfactual explanation (CE) is a minimal perturbed data point for which the decision of the model changes. Most of the existing methods can only provide one CE, which may not be achievable for the user. In this work we derive an iterative method to calculate robust CEs, i.e. CEs that remain valid even after the features are slightly perturbed. To this end, our method provides a whole region of CEs allowing the user to choose a suitable recourse to obtain a desired outcome. We use algorithmic ideas from robust optimization and prove convergence results for the most common machine learning methods including logistic regression, decision trees, random forests, and neural networks. Our experiments show that our method can efficiently generate globally optimal robust CEs for a variety of common data sets and classification models.
LGJul 3, 2023
Fixing confirmation bias in feature attribution methods via semantic matchGiovanni Cinà, Daniel Fernandez-Llaneza, Ludovico Deponte et al.
Feature attribution methods have become a staple method to disentangle the complex behavior of black box models. Despite their success, some scholars have argued that such methods suffer from a serious flaw: they do not allow a reliable interpretation in terms of human concepts. Simply put, visualizing an array of feature contributions is not enough for humans to conclude something about a model's internal representations, and confirmation bias can trick users into false beliefs about model behavior. We argue that a structured approach is required to test whether our hypotheses on the model are confirmed by the feature attributions. This is what we call the "semantic match" between human concepts and (sub-symbolic) explanations. Building on the conceptual framework put forward in Cinà et al. [2023], we propose a structured approach to evaluate semantic match in practice. We showcase the procedure in a suite of experiments spanning tabular and image data, and show how the assessment of semantic match can give insight into both desirable (e.g., focusing on an object relevant for prediction) and undesirable model behaviors (e.g., focusing on a spurious correlation). We couple our experimental results with an analysis on the metrics to measure semantic match, and argue that this approach constitutes the first step towards resolving the issue of confirmation bias in XAI.
AIJan 5, 2023
Semantic match: Debugging feature attribution methods in XAI for healthcareGiovanni Cinà, Tabea E. Röber, Rob Goedhart et al.
The recent spike in certified Artificial Intelligence (AI) tools for healthcare has renewed the debate around adoption of this technology. One thread of such debate concerns Explainable AI (XAI) and its promise to render AI devices more transparent and trustworthy. A few voices active in the medical AI space have expressed concerns on the reliability of Explainable AI techniques and especially feature attribution methods, questioning their use and inclusion in guidelines and standards. Despite valid concerns, we argue that existing criticism on the viability of post-hoc local explainability methods throws away the baby with the bathwater by generalizing a problem that is specific to image data. We begin by characterizing the problem as a lack of semantic match between explanations and human understanding. To understand when feature importance can be used reliably, we introduce a distinction between feature importance of low- and high-level features. We argue that for data types where low-level features come endowed with a clear semantics, such as tabular data like Electronic Health Records (EHRs), semantic match can be obtained, and thus feature attribution methods can still be employed in a meaningful and useful way. Finally, we sketch a procedure to test whether semantic match has been achieved.
LGSep 22, 2022
Counterfactual Explanations Using Optimization With Constraint LearningDonato Maragno, Tabea E. Röber, Ilker Birbil
To increase the adoption of counterfactual explanations in practice, several criteria that these should adhere to have been put forward in the literature. We propose counterfactual explanations using optimization with constraint learning (CE-OCL), a generic and flexible approach that addresses all these criteria and allows room for further extensions. Specifically, we discuss how we can leverage an optimization with constraint learning framework for the generation of counterfactual explanations, and how components of this framework readily map to the criteria. We also propose two novel modeling approaches to address data manifold closeness and diversity, which are two key criteria for practical counterfactual explanations. We test CE-OCL on several datasets and present our results in a case study. Compared against the current state-of-the-art methods, CE-OCL allows for more flexibility and has an overall superior performance in terms of several evaluation metrics proposed in related work.
HCFeb 22
Improving understanding and trust in AI: How users benefit from interval-based counterfactual explanationsTabea E. Röber, Paul Festor, Rob Goedhart et al.
Experimental user studies evaluating the effectiveness of different subtypes of post-hoc explanations for black-box models are largely nonexistent. Therefore, the aim of this study was to investigate and evaluate how different types of counterfactual explanations, namely single point explanations and interval-based explanations, affect both model understanding and (demonstrated) trust. We conducted an online user study using a within-subjects experimental design, where the experimental arms were (i) no explanation (control), (ii) feature importance scores, (iii) point counterfactual explanations, and (iv) interval counterfactual explanations. Our results clearly show the superiority of interval explanations over other tested explanation types in increasing both model understanding and demonstrated trust in the AI. We could not support findings of some previous studies showing an effect of point counterfactual explanations compared to the control group. Our results further highlight the role individual differences in, for example, cognitive style or personality, in explanation effectiveness.
LGApr 21, 2021
Rule Generation for Classification: Scalability, Interpretability, and FairnessTabea E. Röber, Adia C. Lumadjeng, M. Hakan Akyüz et al.
We introduce a new rule-based optimization method for classification with constraints. The proposed method leverages column generation for linear programming, and hence, is scalable to large datasets. The resulting pricing subproblem is shown to be NP-Hard. We recourse to a decision tree-based heuristic and solve a proxy pricing subproblem for acceleration. The method returns a set of rules along with their optimal weights indicating the importance of each rule for learning. We address interpretability and fairness by assigning cost coefficients to the rules and introducing additional constraints. In particular, we focus on local interpretability and generalize a separation criterion in fairness to multiple sensitive attributes and classes. We test the performance of the proposed methodology on a collection of datasets and present a case study to elaborate on its different aspects. The proposed rule-based learning method exhibits a good compromise between local interpretability and fairness on the one side, and accuracy on the other side.