CL LGApr 30, 2022

ExSum: From Local Explanations to Model Understanding

Yilun Zhou, Marco Tulio Ribeiro, Julie Shah

MicrosoftUW

arXiv:2205.00130v132.0640 citationsh-index: 46Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of reliable model understanding for practitioners deploying black-box models, though it is incremental in building on prior interpretability methods.

The authors tackled the problem of quantifying and assessing model understanding from local explanations, introducing the ExSum framework and showing it reveals limitations in current practices and overlooked model properties across two domains.

Interpretability methods are developed to understand the working mechanisms of black-box models, which is crucial to their responsible deployment. Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them. While the former has been addressed in prior work, the latter is often overlooked, resulting in informal model understanding derived from a handful of local explanations. In this paper, we introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding, and propose metrics for its quality assessment. On two domains, ExSum highlights various limitations in the current practice, helps develop accurate model understanding, and reveals easily overlooked properties of the model. We also connect understandability to other properties of explanations such as human alignment, robustness, and counterfactual minimality and plausibility.

View on arXiv PDF Code

Similar