ML LGOct 3, 2018

Interpreting Layered Neural Networks via Hierarchical Modular Representation

arXiv:1810.01588v18.722 citations

Originality Incremental advance

AI Analysis

This work addresses the interpretability challenge for researchers and practitioners using neural networks, but it is incremental as it builds on existing clustering methods.

The paper tackles the problem of interpreting layered neural networks by proposing a hierarchical clustering method to reveal tree-structured relationships among hidden units, addressing the lack of prior knowledge on optimal decomposition resolution and correlation signs.

Interpreting the prediction mechanism of complex models is currently one of the most important tasks in the machine learning field, especially with layered neural networks, which have achieved high predictive performance with various practical data sets. To reveal the global structure of a trained neural network in an interpretable way, a series of clustering methods have been proposed, which decompose the units into clusters according to the similarity of their inference roles. The main problems in these studies were that (1) we have no prior knowledge about the optimal resolution for the decomposition, or the appropriate number of clusters, and (2) there was no method with which to acquire knowledge about whether the outputs of each cluster have a positive or negative correlation with the input and output dimension values. In this paper, to solve these problems, we propose a method for obtaining a hierarchical modular representation of a layered neural network. The application of a hierarchical clustering method to a trained network reveals a tree-structured relationship among hidden layer units, based on their feature vectors defined by their correlation with the input and output dimension values.

View on arXiv PDF

Similar