AIJun 16, 2025

Evaluating Explainability: A Framework for Systematic Assessment and Reporting of Explainable AI Features

arXiv:2506.13917v12 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses the problem of evaluating explainability for developers and users of AI-based medical devices, but it is incremental as it builds on existing explainability concepts.

The paper tackles the lack of evaluation techniques for explainable AI by proposing a framework with four criteria—consistency, plausibility, fidelity, and usefulness—to assess explanation quality, and applies it to heatmaps in breast lesion detection on synthetic mammographies.

Explainability features are intended to provide insight into the internal mechanisms of an AI device, but there is a lack of evaluation techniques for assessing the quality of provided explanations. We propose a framework to assess and report explainable AI features. Our evaluation framework for AI explainability is based on four criteria: 1) Consistency quantifies the variability of explanations to similar inputs, 2) Plausibility estimates how close the explanation is to the ground truth, 3) Fidelity assesses the alignment between the explanation and the model internal mechanisms, and 4) Usefulness evaluates the impact on task performance of the explanation. Finally, we developed a scorecard for AI explainability methods that serves as a complete description and evaluation to accompany this type of algorithm. We describe these four criteria and give examples on how they can be evaluated. As a case study, we use Ablation CAM and Eigen CAM to illustrate the evaluation of explanation heatmaps on the detection of breast lesions on synthetic mammographies. The first three criteria are evaluated for clinically-relevant scenarios. Our proposed framework establishes criteria through which the quality of explanations provided by AI models can be evaluated. We intend for our framework to spark a dialogue regarding the value provided by explainability features and help improve the development and evaluation of AI-based medical devices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes