LGAIMLSep 12, 2019

New Perspective of Interpretability of Deep Neural Networks

arXiv:1909.07156v16 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of defining interpretability for deep neural networks, which is crucial for researchers and practitioners in AI, but it is incremental as it builds on existing interpretability research.

The paper tackles the problem of vague definitions in deep neural network interpretability by proposing a new definition called human predictability, which measures how easily humans can predict changes in model inference when perturbing the model, and introduces an example of a highly human-predictable DNN.

Deep neural networks (DNNs) are known as black-box models. In other words, it is difficult to interpret the internal state of the model. Improving the interpretability of DNNs is one of the hot research topics. However, at present, the definition of interpretability for DNNs is vague, and the question of what is a highly explanatory model is still controversial. To address this issue, we provide the definition of the human predictability of the model, as a part of the interpretability of the DNNs. The human predictability proposed in this paper is defined by easiness to predict the change of the inference when perturbating the model of the DNNs. In addition, we introduce one example of high human-predictable DNNs. We discuss that our definition will help to the research of the interpretability of the DNNs considering various types of applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes