A Dual-Perspective Approach to Evaluating Feature Attribution Methods
This work addresses the problem of assessing feature attribution methods for researchers and practitioners in explainable AI, offering a more cohesive framework, though it is incremental as it builds on existing faithfulness evaluations.
The paper tackles the challenge of evaluating feature attribution methods for neural networks by introducing two new perspectives, soundness and completeness, within the faithfulness paradigm, providing quantitative metrics that reveal intuitive properties and are applied to mainstream methods.
Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model's behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.