LGMLFeb 1, 2022

Framework for Evaluating Faithfulness of Local Explanations

arXiv:2202.00734v189 citations
Originality Incremental advance
AI Analysis

This work addresses the need for reliable evaluation metrics in explainable AI, particularly for researchers and practitioners using black-box models, though it is incremental in building on existing explanation methods.

The paper tackles the problem of evaluating the faithfulness of local explanation systems to prediction models by introducing two properties (consistency and sufficiency) with quantitative measures that depend on data distribution, and it provides estimators and sample complexity bounds validated experimentally.

We study the faithfulness of an explanation system to the underlying prediction model. We show that this can be captured by two properties, consistency and sufficiency, and introduce quantitative measures of the extent to which these hold. Interestingly, these measures depend on the test-time data distribution. For a variety of existing explanation systems, such as anchors, we analytically study these quantities. We also provide estimators and sample complexity bounds for empirically determining the faithfulness of black-box explanation systems. Finally, we experimentally validate the new properties and estimators.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes