LGMay 23, 2017

Interpreting Blackbox Models via Model Extraction

arXiv:1705.08504v6192 citations
Originality Incremental advance
AI Analysis

This addresses the need for interpretability in machine learning used for consequential decisions, such as predicting diabetes risk, but is incremental as it builds on existing model extraction methods.

The paper tackles the problem of interpreting blackbox machine learning models by constructing global explanations in the form of decision trees that approximate the original models, resulting in decision trees that are substantially more accurate and equally or more interpretable compared to baselines.

Interpretability has become incredibly important as machine learning is increasingly used to inform consequential decisions. We propose to construct global explanations of complex, blackbox models in the form of a decision tree approximating the original model---as long as the decision tree is a good approximation, then it mirrors the computation performed by the blackbox model. We devise a novel algorithm for extracting decision tree explanations that actively samples new training points to avoid overfitting. We evaluate our algorithm on a random forest to predict diabetes risk and a learned controller for cart-pole. Compared to several baselines, our decision trees are both substantially more accurate and equally or more interpretable based on a user study. Finally, we describe several insights provided by our interpretations, including a causal issue validated by a physician.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes