LGDec 3, 2020

Interpretability and Explainability: A Machine Learning Zoo Mini-tour

arXiv:2012.01805v2137 citations
Originality Synthesis-oriented
AI Analysis

This paper provides a conceptual overview of interpretability and explainability for machine learning practitioners, clarifying the distinction between these two concepts.

This review explores the concepts of interpretability and explainability in machine learning, highlighting their distinct definitions and illustrating them with concrete examples from state-of-the-art deep learning methods. It aims to provide a primer for a general machine learning audience interested in these topics.

In this review, we examine the problem of designing interpretable and explainable machine learning models. Interpretability and explainability lie at the core of many machine learning and statistical applications in medicine, economics, law, and natural sciences. Although interpretability and explainability have escaped a clear universal definition, many techniques motivated by these properties have been developed over the recent 30 years with the focus currently shifting towards deep learning methods. In this review, we emphasise the divide between interpretability and explainability and illustrate these two different research directions with concrete examples of the state-of-the-art. The review is intended for a general machine learning audience with interest in exploring the problems of interpretation and explanation beyond logistic regression or random forest variable importance. This work is not an exhaustive literature survey, but rather a primer focusing selectively on certain lines of research which the authors found interesting or informative.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes