Ulf Johansson

LG
h-index11
6papers
68citations
Novelty40%
AI Score31

6 Papers

AIMar 25, 2022
A Meta Survey of Quality Evaluation Criteria in Explanation Methods

Helena Löfström, Karl Hammar, Ulf Johansson

Explanation methods and their evaluation have become a significant issue in explainable artificial intelligence (XAI) due to the recent surge of opaque AI models in decision support systems (DSS). Since the most accurate AI models are opaque with low transparency and comprehensibility, explanations are essential for bias detection and control of uncertainty. There are a plethora of criteria to choose from when evaluating explanation method quality. However, since existing criteria focus on evaluating single explanation methods, it is not obvious how to compare the quality of different methods. This lack of consensus creates a critical shortage of rigour in the field, although little is written about comparative evaluations of explanation methods. In this paper, we have conducted a semi-systematic meta-survey over fifteen literature surveys covering the evaluation of explainability to identify existing criteria usable for comparative evaluations of explanation methods. The main contribution in the paper is the suggestion to use appropriate trust as a criterion to measure the outcome of the subjective evaluation criteria and consequently make comparative evaluations possible. We also present a model of explanation quality aspects. In the model, criteria with similar definitions are grouped and related to three identified aspects of quality; model, explanation, and user. We also notice four commonly accepted criteria (groups) in the literature, covering all aspects of explanation quality: Performance, appropriate trust, explanation satisfaction, and fidelity. We suggest the model be used as a chart for comparative evaluations to create more generalisable research in explanation quality.

LGAug 30, 2023
Calibrated Explanations for Regression

Tuwe Löfström, Helena Löfström, Ulf Johansson et al.

Artificial Intelligence (AI) is often an integral part of modern decision support systems. The best-performing predictive models used in AI-based decision support systems lack transparency. Explainable Artificial Intelligence (XAI) aims to create AI systems that can explain their rationale to human users. Local explanations in XAI can provide information about the causes of individual predictions in terms of feature importance. However, a critical drawback of existing local explanation methods is their inability to quantify the uncertainty associated with a feature's importance. This paper introduces an extension of a feature importance explanation method, Calibrated Explanations, previously only supporting classification, with support for standard regression and probabilistic regression, i.e., the probability that the target is above an arbitrary threshold. The extension for regression keeps all the benefits of Calibrated Explanations, such as calibration of the prediction from the underlying model with confidence intervals, uncertainty quantification of feature importance, and allows both factual and counterfactual explanations. Calibrated Explanations for standard regression provides fast, reliable, stable, and robust explanations. Calibrated Explanations for probabilistic regression provides an entirely new way of creating probabilistic explanations from any ordinary regression model, allowing dynamic selection of thresholds. The method is model agnostic with easily understood conditional rules. An implementation in Python is freely available on GitHub and for installation using both pip and conda, making the results in this paper easily replicable.

LGAug 23, 2023
Approximating Score-based Explanation Techniques Using Conformal Regression

Amr Alkhatib, Henrik Boström, Sofiane Ennadir et al.

Score-based explainable machine-learning techniques are often used to understand the logic behind black-box models. However, such explanation techniques are often computationally expensive, which limits their application in time-critical contexts. Therefore, we propose and investigate the use of computationally less costly regression models for approximating the output of score-based explanation techniques, such as SHAP. Moreover, validity guarantees for the approximated values are provided by the employed inductive conformal prediction framework. We propose several non-conformity measures designed to take the difficulty of approximating the explanations into account while keeping the computational cost low. We present results from a large-scale empirical investigation, in which the approximate explanations generated by our proposed models are evaluated with respect to efficiency (interval size). The results indicate that the proposed method can significantly improve execution time compared to the fast version of SHAP, TreeSHAP. The results also suggest that the proposed method can produce tight intervals, while providing validity guarantees. Moreover, the proposed approach allows for comparing explanations of different approximation methods and selecting a method based on how informative (tight) are the predicted intervals.

LGJun 11, 2023
Well-Calibrated Probabilistic Predictive Maintenance using Venn-Abers

Ulf Johansson, Tuwe Löfström, Cecilia Sönströd

When using machine learning for fault detection, a common problem is the fact that most data sets are very unbalanced, with the minority class (a fault) being the interesting one. In this paper, we investigate the usage of Venn-Abers predictors, looking specifically at the effect on the minority class predictions. A key property of Venn-Abers predictors is that they output well-calibrated probability intervals. In the experiments, we apply Venn-Abers calibration to decision trees, random forests and XGBoost models, showing how both overconfident and underconfident models are corrected. In addition, the benefit of using the valid probability intervals produced by Venn-Abers for decision support is demonstrated. When using techniques producing opaque underlying models, e.g., random forest and XGBoost, each prediction will consist of not only the label, but also a valid probability interval, where the width is an indication of the confidence in the estimate. Adding Venn-Abers on top of a decision tree allows inspection and analysis of the model, to understand both the underlying relationship, and finding out in which parts of feature space that the model is accurate and/or confident.

MLJun 26, 2025
Classification with Reject Option: Distribution-free Error Guarantees via Conformal Prediction

Johan Hallberg Szabadváry, Tuwe Löfström, Ulf Johansson et al.

Machine learning (ML) models always make a prediction, even when they are likely to be wrong. This causes problems in practical applications, as we do not know if we should trust a prediction. ML with reject option addresses this issue by abstaining from making a prediction if it is likely to be incorrect. In this work, we formalise the approach to ML with reject option in binary classification, deriving theoretical guarantees on the resulting error rate. This is achieved through conformal prediction (CP), which produce prediction sets with distribution-free validity guarantees. In binary classification, CP can output prediction sets containing exactly one, two or no labels. By accepting only the singleton predictions, we turn CP into a binary classifier with reject option. Here, CP is formally put in the framework of predicting with reject option. We state and prove the resulting error rate, and give finite sample estimates. Numerical examples provide illustrations of derived error rate through several different conformal prediction settings, ranging from full conformal prediction to offline batch inductive conformal prediction. The former has a direct link to sharp validity guarantees, whereas the latter is more fuzzy in terms of validity guarantees but can be used in practice. Error-reject curves illustrate the trade-off between error rate and reject rate, and can serve to aid a user to set an acceptable error rate or reject rate in practice.

AIMay 3, 2023
Calibrated Explanations: with Uncertainty Information and Counterfactuals

Helena Lofstrom, Tuwe Lofstrom, Ulf Johansson et al.

While local explanations for AI models can offer insights into individual predictions, such as feature importance, they are plagued by issues like instability. The unreliability of feature weights, often skewed due to poorly calibrated ML models, deepens these challenges. Moreover, the critical aspect of feature importance uncertainty remains mostly unaddressed in Explainable AI (XAI). The novel feature importance explanation method presented in this paper, called Calibrated Explanations (CE), is designed to tackle these issues head-on. Built on the foundation of Venn-Abers, CE not only calibrates the underlying model but also delivers reliable feature importance explanations with an exact definition of the feature weights. CE goes beyond conventional solutions by addressing output uncertainty. It accomplishes this by providing uncertainty quantification for both feature weights and the model's probability estimates. Additionally, CE is model-agnostic, featuring easily comprehensible conditional rules and the ability to generate counterfactual explanations with embedded uncertainty quantification. Results from an evaluation with 25 benchmark datasets underscore the efficacy of CE, making it stand as a fast, reliable, stable, and robust solution.