Technical Note: Defining and Quantifying AND-OR Interactions for Faithful and Concise Explanation of DNNs
This work addresses the need for more interpretable AI explanations, but it appears incremental as it builds on existing interaction-based methods.
The paper tackles the problem of explaining deep neural networks by quantifying AND and OR interactions between input variables to reflect inference logic, proposing definitions for faithfulness and conciseness and proving uniqueness for these interactions.
In this technical note, we aim to explain a deep neural network (DNN) by quantifying the encoded interactions between input variables, which reflects the DNN's inference logic. Specifically, we first rethink the definition of interactions, and then formally define faithfulness and conciseness for interaction-based explanation. To this end, we propose two kinds of interactions, i.e., the AND interaction and the OR interaction. For faithfulness, we prove the uniqueness of the AND (OR) interaction in quantifying the effect of the AND (OR) relationship between input variables. Besides, based on AND-OR interactions, we design techniques to boost the conciseness of the explanation, while not hurting the faithfulness. In this way, the inference logic of a DNN can be faithfully and concisely explained by a set of symbolic concepts.