LGAIJan 30, 2018

The Intriguing Properties of Model Explanations

arXiv:1801.09808v15 citations
Originality Incremental advance
AI Analysis

This work addresses the reliability and impact of interpretability methods in machine learning, which is crucial for users relying on model explanations, though it is incremental in analyzing existing techniques.

The paper investigates the consistency and potential misleading nature of linear explanations for model predictions, and examines how integrating explanations into the prediction process affects model performance, finding that joint learning of explanations and predictions is often beneficial.

Linear approximations to the decision boundary of a complex model have become one of the most popular tools for interpreting predictions. In this paper, we study such linear explanations produced either post-hoc by a few recent methods or generated along with predictions with contextual explanation networks (CENs). We focus on two questions: (i) whether linear explanations are always consistent or can be misleading, and (ii) when integrated into the prediction process, whether and how explanations affect the performance of the model. Our analysis sheds more light on certain properties of explanations produced by different methods and suggests that learning models that explain and predict jointly is often advantageous.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes