CV IVJul 29, 2024

Mean Opinion Score as a New Metric for User-Evaluation of XAI Methods

Hyeon Yu, Jenny Benois-Pineau, Romain Bourqui, Romain Giot, Alexey Zhukov

arXiv:2407.20427v12.0h-index: 2

Originality Incremental advance

AI Analysis

This addresses the challenge of user evaluation in XAI for researchers and practitioners, but it is incremental as it adapts an existing metric from image quality to XAI.

This paper tackles the problem of evaluating explainable AI (XAI) methods by proposing Mean Opinion Score (MOS) as a user-centric metric, finding that MLFEM has the highest correlation with automatic metrics like IAUC and DAUC, but overall correlations are limited, indicating a lack of consensus between user and automatic evaluations.

This paper investigates the use of Mean Opinion Score (MOS), a common image quality metric, as a user-centric evaluation metric for XAI post-hoc explainers. To measure the MOS, a user experiment is proposed, which has been conducted with explanation maps of intentionally distorted images. Three methods from the family of feature attribution methods - Gradient-weighted Class Activation Mapping (Grad-CAM), Multi-Layered Feature Explanation Method (MLFEM), and Feature Explanation Method (FEM) - are compared with this metric. Additionally, the correlation of this new user-centric metric with automatic metrics is studied via Spearman's rank correlation coefficient. MOS of MLFEM shows the highest correlation with automatic metrics of Insertion Area Under Curve (IAUC) and Deletion Area Under Curve (DAUC). However, the overall correlations are limited, which highlights the lack of consensus between automatic and user-centric metrics.

View on arXiv PDF

Similar