LGMLJul 8, 2019

The Price of Interpretability

arXiv:1907.03419v139 citations
AI Analysis

This work addresses the need for formal interpretability measures in decision-making applications, though it is incremental in extending existing proxies.

The paper tackles the loosely defined concept of interpretability in machine learning by introducing a mathematical framework to construct models through interpretable steps, quantifying the tradeoff between interpretability and predictive accuracy as the 'price' of interpretability.

When quantitative models are used to support decision-making on complex and important topics, understanding a model's ``reasoning'' can increase trust in its predictions, expose hidden biases, or reduce vulnerability to adversarial attacks. However, the concept of interpretability remains loosely defined and application-specific. In this paper, we introduce a mathematical framework in which machine learning models are constructed in a sequence of interpretable steps. We show that for a variety of models, a natural choice of interpretable steps recovers standard interpretability proxies (e.g., sparsity in linear models). We then generalize these proxies to yield a parametrized family of consistent measures of model interpretability. This formal definition allows us to quantify the ``price'' of interpretability, i.e., the tradeoff with predictive accuracy. We demonstrate practical algorithms to apply our framework on real and synthetic datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes