Clusterability in Neural Networks
This work addresses interpretability for engineers by enabling partitioning of neural networks into meaningful clusters, though it is incremental in nature.
The paper tackled the problem of identifying internal structure in neural networks by analyzing clusterability, finding that trained networks are more clusterable than random ones and that promoting clusterability can be achieved with minimal accuracy loss.
The learned weights of a neural network have often been considered devoid of scrutable internal structure. In this paper, however, we look for structure in the form of clusterability: how well a network can be divided into groups of neurons with strong internal connectivity but weak external connectivity. We find that a trained neural network is typically more clusterable than randomly initialized networks, and often clusterable relative to random networks with the same distribution of weights. We also exhibit novel methods to promote clusterability in neural network training, and find that in multi-layer perceptrons they lead to more clusterable networks with little reduction in accuracy. Understanding and controlling the clusterability of neural networks will hopefully render their inner workings more interpretable to engineers by facilitating partitioning into meaningful clusters.