Evaluating the Ability of Explanations to Disambiguate Models in a Rashomon Set

Kaivalya Rawal, Eoin Delaney, Zihao Fu, Sandra Wachter, Chris Russell

arXiv:2601.08703v12.4h-index: 3

Originality Incremental advance

AI Analysis

This addresses the challenge of selecting trustworthy models for deployment in explainable AI, though it is incremental as it builds on prior work on explanation evaluation.

The paper tackles the problem of evaluating explanations in a Rashomon set of similarly performing models, showing that existing evaluation methods can be misled by adversarial fairwashing, and proposes a new method, AXE, which detects such issues with 100% success rate.

Explainable artificial intelligence (XAI) is concerned with producing explanations indicating the inner workings of models. For a Rashomon set of similarly performing models, explanations provide a way of disambiguating the behavior of individual models, helping select models for deployment. However explanations themselves can vary depending on the explainer used, and need to be evaluated. In the paper "Evaluating Model Explanations without Ground Truth", we proposed three principles of explanation evaluation and a new method "AXE" to evaluate the quality of feature-importance explanations. We go on to illustrate how evaluation metrics that rely on comparing model explanations against ideal ground truth explanations obscure behavioral differences within a Rashomon set. Explanation evaluation aligned with our proposed principles would highlight these differences instead, helping select models from the Rashomon set. The selection of alternate models from the Rashomon set can maintain identical predictions but mislead explainers into generating false explanations, and mislead evaluation methods into considering the false explanations to be of high quality. AXE, our proposed explanation evaluation method, can detect this adversarial fairwashing of explanations with a 100% success rate. Unlike prior explanation evaluation strategies such as those based on model sensitivity or ground truth comparison, AXE can determine when protected attributes are used to make predictions.

View on arXiv PDF

Similar