LG MLFeb 1, 2022

Framework for Evaluating Faithfulness of Local Explanations

Sanjoy Dasgupta, Nave Frost, Michal Moshkovitz

arXiv:2202.00734v122.989 citations

Originality Incremental advance

AI Analysis

This work addresses the need for reliable evaluation metrics in explainable AI, particularly for researchers and practitioners using black-box models, though it is incremental in building on existing explanation methods.

The paper tackles the problem of evaluating the faithfulness of local explanation systems to prediction models by introducing two properties (consistency and sufficiency) with quantitative measures that depend on data distribution, and it provides estimators and sample complexity bounds validated experimentally.

We study the faithfulness of an explanation system to the underlying prediction model. We show that this can be captured by two properties, consistency and sufficiency, and introduce quantitative measures of the extent to which these hold. Interestingly, these measures depend on the test-time data distribution. For a variety of existing explanation systems, such as anchors, we analytically study these quantities. We also provide estimators and sample complexity bounds for empirically determining the faithfulness of black-box explanation systems. Finally, we experimentally validate the new properties and estimators.

View on arXiv PDF

Similar