CLJun 21, 2024

Evaluating Input Feature Explanations through a Unified Diagnostic Evaluation Framework

Jingyi Sun, Pepa Atanasova, Isabelle Augenstein

arXiv:2406.15085v210.414 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for standardized evaluation in explainable AI, but it is incremental as it builds on existing explanation methods without introducing new ones.

The authors tackled the problem of comparing different types of input feature explanations for machine learning models by developing a unified diagnostic evaluation framework, revealing that interactive span explanations generally outperform other types across most properties.

Explaining the decision-making process of machine learning models is crucial for ensuring their reliability and transparency for end users. One popular explanation form highlights key input features, such as i) tokens (e.g., Shapley Values and Integrated Gradients), ii) interactions between tokens (e.g., Bivariate Shapley and Attention-based methods), or iii) interactions between spans of the input (e.g., Louvain Span Interactions). However, these explanation types have only been studied in isolation, making it difficult to judge their respective applicability. To bridge this gap, we develop a unified framework that facilitates an automated and direct comparison between highlight and interactive explanations comprised of four diagnostic properties. We conduct an extensive analysis across these three types of input feature explanations -- each utilizing three different explanation techniques -- across two datasets and two models, and reveal that each explanation has distinct strengths across the different diagnostic properties. Nevertheless, interactive span explanations outperform other types of input feature explanations across most diagnostic properties. Despite being relatively understudied, our analysis underscores the need for further research to improve methods generating these explanation types. Additionally, integrating them with other explanation types that perform better in certain characteristics could further enhance their overall effectiveness.

View on arXiv PDF

Similar