MLLGJan 28, 2019

Interpreting Deep Neural Networks Through Variable Importance

arXiv:1901.09839v316 citations
Originality Incremental advance
AI Analysis

It addresses the need for interpretability in AI for researchers and practitioners, but is incremental as it builds on existing methods like RATE.

The paper tackles the problem of interpreting deep neural networks by proposing a method for global feature importance that accounts for variable dependence, applying it to computer vision, NLP, and social science with concrete results in these domains.

While the success of deep neural networks (DNNs) is well-established across a variety of domains, our ability to explain and interpret these methods is limited. Unlike previously proposed local methods which try to explain particular classification decisions, we focus on global interpretability and ask a universally applicable question: given a trained model, which features are the most important? In the context of neural networks, a feature is rarely important on its own, so our strategy is specifically designed to leverage partial covariance structures and incorporate variable dependence into feature ranking. Our methodological contributions in this paper are two-fold. First, we propose an effect size analogue for DNNs that is appropriate for applications with highly collinear predictors (ubiquitous in computer vision). Second, we extend the recently proposed "RelATive cEntrality" (RATE) measure (Crawford et al., 2019) to the Bayesian deep learning setting. RATE applies an information theoretic criterion to the posterior distribution of effect sizes to assess feature significance. We apply our framework to three broad application areas: computer vision, natural language processing, and social science.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes