On Relating 'Why?' and 'Why Not?' Explanations
This work provides a foundational understanding of two distinct explanation types, which could benefit researchers developing more comprehensive and robust ML interpretability methods.
This paper establishes a formal relationship between 'Why?' explanations (sufficient feature-value pairs for a prediction) and 'Why Not?' explanations (changes in feature values that alter a prediction). It proves that 'Why?' explanations are minimal hitting sets of 'Why Not?' explanations, and vice-versa.
Explanations of Machine Learning (ML) models often address a 'Why?' question. Such explanations can be related with selecting feature-value pairs which are sufficient for the prediction. Recent work has investigated explanations that address a 'Why Not?' question, i.e. finding a change of feature values that guarantee a change of prediction. Given their goals, these two forms of explaining predictions of ML models appear to be mostly unrelated. However, this paper demonstrates otherwise, and establishes a rigorous formal relationship between 'Why?' and 'Why Not?' explanations. Concretely, the paper proves that, for any given instance, 'Why?' explanations are minimal hitting sets of 'Why Not?' explanations and vice-versa. Furthermore, the paper devises novel algorithms for extracting and enumerating both forms of explanations.