MLLGJun 19, 2020

How does this interaction affect me? Interpretable attribution for feature interactions

arXiv:2006.10965v1102 citations
Originality Incremental advance
AI Analysis

This work addresses the need for transparent and interpretable explanations of feature interactions in predictions, which is crucial for analyzing model behavior in real-world applications, though it appears incremental in improving existing methods.

The paper tackles the problem of interpretable attribution for feature interactions in machine learning models, proposing the Archipelago framework which provides significantly more interpretable explanations than comparable methods, as shown in experiments on standard annotation labels.

Machine learning transparency calls for interpretable explanations of how inputs relate to predictions. Feature attribution is a way to analyze the impact of features on predictions. Feature interactions are the contextual dependence between features that jointly impact predictions. There are a number of methods that extract feature interactions in prediction models; however, the methods that assign attributions to interactions are either uninterpretable, model-specific, or non-axiomatic. We propose an interaction attribution and detection framework called Archipelago which addresses these problems and is also scalable in real-world settings. Our experiments on standard annotation labels indicate our approach provides significantly more interpretable explanations than comparable methods, which is important for analyzing the impact of interactions on predictions. We also provide accompanying visualizations of our approach that give new insights into deep neural networks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes