LGAICVNEMLJun 10, 2016

The Mythos of Model Interpretability

arXiv:1606.03490v34397 citations
AI Analysis

This work addresses the foundational problem of clarifying interpretability discourse for researchers and practitioners, but it is incremental as it synthesizes and critiques existing ideas rather than introducing new methods.

The paper critiques the ambiguous and often unsubstantiated claims about model interpretability in machine learning, analyzing diverse motivations and competing notions like transparency and post-hoc explanations, while questioning common assumptions about linear models and deep neural networks.

Supervised machine learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world? We want models to be not only good, but interpretable. And yet the task of interpretation appears underspecified. Papers provide diverse and sometimes non-overlapping motivations for interpretability, and offer myriad notions of what attributes render models interpretable. Despite this ambiguity, many papers proclaim interpretability axiomatically, absent further explanation. In this paper, we seek to refine the discourse on interpretability. First, we examine the motivations underlying interest in interpretability, finding them to be diverse and occasionally discordant. Then, we address model properties and techniques thought to confer interpretability, identifying transparency to humans and post-hoc explanations as competing notions. Throughout, we discuss the feasibility and desirability of different notions, and question the oft-made assertions that linear models are interpretable and that deep neural networks are not.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes