Evaluating Feature Attribution Methods in the Image Domain
This work addresses the need for reliable benchmarking of attribution methods for researchers and practitioners in interpretable AI, though it is incremental as it builds on prior evaluation studies.
The paper tackled the problem of objectively evaluating feature attribution maps in image models, finding that different metrics measure distinct concepts and results do not generalize across datasets, with methods like DeepSHAP not always outperforming cheaper alternatives.
Feature attribution maps are a popular approach to highlight the most important pixels in an image for a given prediction of a model. Despite a recent growth in popularity and available methods, little attention is given to the objective evaluation of such attribution maps. Building on previous work in this domain, we investigate existing metrics and propose new variants of metrics for the evaluation of attribution maps. We confirm a recent finding that different attribution metrics seem to measure different underlying concepts of attribution maps, and extend this finding to a larger selection of attribution metrics. We also find that metric results on one dataset do not necessarily generalize to other datasets, and methods with desirable theoretical properties such as DeepSHAP do not necessarily outperform computationally cheaper alternatives. Based on these findings, we propose a general benchmarking approach to identify the ideal feature attribution method for a given use case. Implementations of attribution metrics and our experiments are available online.