LG AI CVMay 25, 2023

An Experimental Investigation into the Evaluation of Explainability Methods

Sédrick Stassin, Alexandre Englebert, Géraldin Nanfack, Julien Albert, Nassim Versbraegen, Gilles Peiffer, Miriam Doh, Nicolas Riche, Benoît Frenay, Christophe De Vleeschouwer

arXiv:2305.16361v12.0Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for standardized evaluation in explainable AI, though it is incremental as it focuses on comparing existing metrics rather than introducing new ones.

The paper compared 14 evaluation metrics for explainable AI methods, revealing high correlation among some metrics and significant sensitivity to baseline hyperparameters, while also using dummy methods to highlight metric limitations.

EXplainable Artificial Intelligence (XAI) aims to help users to grasp the reasoning behind the predictions of an Artificial Intelligence (AI) system. Many XAI approaches have emerged in recent years. Consequently, a subfield related to the evaluation of XAI methods has gained considerable attention, with the aim to determine which methods provide the best explanation using various approaches and criteria. However, the literature lacks a comparison of the evaluation metrics themselves, that one can use to evaluate XAI methods. This work aims to fill this gap by comparing 14 different metrics when applied to nine state-of-the-art XAI methods and three dummy methods (e.g., random saliency maps) used as references. Experimental results show which of these metrics produces highly correlated results, indicating potential redundancy. We also demonstrate the significant impact of varying the baseline hyperparameter on the evaluation metric values. Finally, we use dummy methods to assess the reliability of metrics in terms of ranking, pointing out their limitations.

View on arXiv PDF Code

Similar