CLLGApr 30, 2022

ExSum: From Local Explanations to Model Understanding

MicrosoftUW
arXiv:2205.00130v1640 citationsh-index: 46Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of reliable model understanding for practitioners deploying black-box models, though it is incremental in building on prior interpretability methods.

The authors tackled the problem of quantifying and assessing model understanding from local explanations, introducing the ExSum framework and showing it reveals limitations in current practices and overlooked model properties across two domains.

Interpretability methods are developed to understand the working mechanisms of black-box models, which is crucial to their responsible deployment. Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them. While the former has been addressed in prior work, the latter is often overlooked, resulting in informal model understanding derived from a handful of local explanations. In this paper, we introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding, and propose metrics for its quality assessment. On two domains, ExSum highlights various limitations in the current practice, helps develop accurate model understanding, and reveals easily overlooked properties of the model. We also connect understandability to other properties of explanations such as human alignment, robustness, and counterfactual minimality and plausibility.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes