ML LGMay 16, 2017

Learning how to explain neural networks: PatternNet and PatternAttribution

Pieter-Jan Kindermans, Kristof T. Schütt, Maximilian Alber, Klaus-Robert Müller, Dumitru Erhan, Been Kim, Sven Dähne

arXiv:1705.05598v231.1368 citations

Originality Incremental advance

AI Analysis

This work addresses the reliability of explanation methods for neural networks, which is crucial for researchers and practitioners in AI interpretability, though it is incremental as it builds on prior methods.

The authors tackled the problem that existing explanation methods for neural networks (DeConvNet, Guided BackProp, LRP) fail to produce theoretically correct explanations even for simple linear models, which are a limit case of neural networks. They proposed PatternNet and PatternAttribution, two new techniques that are theoretically sound for linear models and provide improved explanations for deep networks.

DeConvNet, Guided BackProp, LRP, were invented to better understand deep neural networks. We show that these methods do not produce the theoretically correct explanation for a linear model. Yet they are used on multi-layer networks with millions of parameters. This is a cause for concern since linear models are simple neural networks. We argue that explanation methods for neural nets should work reliably in the limit of simplicity, the linear models. Based on our analysis of linear models we propose a generalization that yields two explanation techniques (PatternNet and PatternAttribution) that are theoretically sound for linear models and produce improved explanations for deep networks.

View on arXiv PDF

Similar