Autoencoder Node Saliency: Selecting Relevant Latent Representations
This work addresses the interpretability of autoencoders for researchers and practitioners, but it is incremental as it builds on existing methods for feature selection.
The authors tackled the problem of interpreting autoencoder latent representations by proposing a supervised node saliency method to rank hidden nodes based on class distributions, demonstrating its ability to explain what trained autoencoders have learned on real datasets.
The autoencoder is an artificial neural network model that learns hidden representations of unlabeled data. With a linear transfer function it is similar to the principal component analysis (PCA). While both methods use weight vectors for linear transformations, the autoencoder does not come with any indication similar to the eigenvalues in PCA that are paired with the eigenvectors. We propose a novel supervised node saliency (SNS) method that ranks the hidden nodes by comparing class distributions of latent representations against a fixed reference distribution. The latent representations of a hidden node can be described using a one-dimensional histogram. We apply normalized entropy difference (NED) to measure the "interestingness" of the histograms, and conclude a property for NED values to identify a good classifying node. By applying our methods to real data sets, we demonstrate the ability of SNS to explain what the trained autoencoders have learned.