LG MLMay 31, 2020

Evaluations and Methods for Explanation through Robustness Analysis

Cheng-Yu Hsieh, Chih-Kuan Yeh, Xuanqing Liu, Pradeep Ravikumar, Seungyeon Kim, Sanjiv Kumar, Cho-Jui Hsieh

arXiv:2006.00442v222.270 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of reliably explaining model predictions for users in machine learning, though it is incremental as it builds on existing explanation methods.

The paper tackles the problem of evaluating feature-based explanations by introducing a new set of criteria based on robustness analysis, using adversarial perturbations instead of feature removal to avoid biases, and validates this through experiments and a user study.

Feature based explanations, that provide importance of each feature towards the model prediction, is arguably one of the most intuitive ways to explain a model. In this paper, we establish a novel set of evaluation criteria for such feature based explanations by robustness analysis. In contrast to existing evaluations which require us to specify some way to "remove" features that could inevitably introduces biases and artifacts, we make use of the subtler notion of smaller adversarial perturbations. By optimizing towards our proposed evaluation criteria, we obtain new explanations that are loosely necessary and sufficient for a prediction. We further extend the explanation to extract the set of features that would move the current prediction to a target class by adopting targeted adversarial attack for the robustness analysis. Through experiments across multiple domains and a user study, we validate the usefulness of our evaluation criteria and our derived explanations.

View on arXiv PDF Code

Similar