MLLGSTAPMar 22, 2021

Interpreting Deep Learning Models with Marginal Attribution by Conditioning on Quantiles

arXiv:2103.11706v113 citations
Originality Incremental advance
AI Analysis

This provides a new tool for researchers and practitioners in explainable AI to better understand model behavior, though it is incremental within the growing literature on model interpretation.

The paper tackles the problem of interpreting deep learning models by introducing MACQ, a global gradient-based method that analyzes how features contribute to predictions across different output levels, enabling separation of marginal attribution from interaction effects.

A vastly growing literature on explaining deep learning models has emerged. This paper contributes to that literature by introducing a global gradient-based model-agnostic method, which we call Marginal Attribution by Conditioning on Quantiles (MACQ). Our approach is based on analyzing the marginal attribution of predictions (outputs) to individual features (inputs). Specificalllly, we consider variable importance by mixing (global) output levels and, thus, explain how features marginally contribute across different regions of the prediction space. Hence, MACQ can be seen as a marginal attribution counterpart to approaches such as accumulated local effects (ALE), which study the sensitivities of outputs by perturbing inputs. Furthermore, MACQ allows us to separate marginal attribution of individual features from interaction effect, and visually illustrate the 3-way relationship between marginal attribution, output level, and feature value.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes