Towards Rigorous Interpretations: a Formalisation of Feature Attribution
This work addresses the issue of rigorous interpretability in machine learning, which is crucial for building trust in AI systems, but it is incremental as it builds on existing formalization efforts.
The paper tackles the problem of inconsistent definitions and lack of ground-truth in feature attribution methods by formalizing feature selection based on relaxed functional dependence, and it shows that some state-of-the-art methods fail to meet proposed properties when evaluated on synthetic datasets.
Feature attribution is often loosely presented as the process of selecting a subset of relevant features as a rationale of a prediction. Task-dependent by nature, precise definitions of "relevance" encountered in the literature are however not always consistent. This lack of clarity stems from the fact that we usually do not have access to any notion of ground-truth attribution and from a more general debate on what good interpretations are. In this paper we propose to formalise feature selection/attribution based on the concept of relaxed functional dependence. In particular, we extend our notions to the instance-wise setting and derive necessary properties for candidate selection solutions, while leaving room for task-dependence. By computing ground-truth attributions on synthetic datasets, we evaluate many state-of-the-art attribution methods and show that, even when optimised, some fail to verify the proposed properties and provide wrong solutions.