Interpretable Artificial Intelligence through the Lens of Feature Interaction
It addresses the problem of interpretability in deep learning for stakeholders in high-stakes domains, but is incremental as it focuses on surveying existing methods.
This paper surveys interpretability methods for deep learning models that explicitly consider feature interactions, highlighting their importance in improving trustworthiness and fairness for critical applications like credit approval and recidivism prediction.
Interpretation of deep learning models is a very challenging problem because of their large number of parameters, complex connections between nodes, and unintelligible feature representations. Despite this, many view interpretability as a key solution to trustworthiness, fairness, and safety, especially as deep learning is applied to more critical decision tasks like credit approval, job screening, and recidivism prediction. There is an abundance of good research providing interpretability to deep learning models; however, many of the commonly used methods do not consider a phenomenon called "feature interaction." This work first explains the historical and modern importance of feature interactions and then surveys the modern interpretability methods which do explicitly consider feature interactions. This survey aims to bring to light the importance of feature interactions in the larger context of machine learning interpretability, especially in a modern context where deep learning models heavily rely on feature interactions.