AS AI SDJul 19, 2021

On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples

Verena Praher, Katharina Prinz, Arthur Flexer, Gerhard Widmer

arXiv:2107.09045v25.110 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of verifying explanation methods for researchers and practitioners in machine learning, particularly in music information retrieval, but it is incremental as it builds on existing adversarial testing approaches.

The paper investigates the reliability of LIME's local explanations in audio classification by using adversarial examples to test whether LIME correctly identifies input features responsible for model predictions, finding that it often fails to detect relevant features, raising doubts about its usefulness.

Local explanation methods such as LIME have become popular in MIR as tools for generating post-hoc, model-agnostic explanations of a model's classification decisions. The basic idea is to identify a small set of human-understandable features of the classified example that are most influential on the classifier's prediction. These are then presented as an explanation. Evaluation of such explanations in publications often resorts to accepting what matches the expectation of a human without actually being able to verify if what the explanation shows is what really caused the model's prediction. This paper reports on targeted investigations where we try to get more insight into the actual veracity of LIME's explanations in an audio classification task. We deliberately design adversarial examples for the classifier, in a way that gives us knowledge about which parts of the input are potentially responsible for the model's (wrong) prediction. Asking LIME to explain the predictions for these adversaries permits us to study whether local explanations do indeed detect these regions of interest. We also look at whether LIME is more successful in finding perturbations that are more prominent and easily noticeable for a human. Our results suggest that LIME does not necessarily manage to identify the most relevant input features and hence it remains unclear whether explanations are useful or even misleading.

View on arXiv PDF Code

Similar