Interpretable Textual Neuron Representations for NLP
This work addresses the need for interpretability in NLP models, but it is incremental as it transfers existing methods from computer vision to NLP.
The authors tackled the problem of creating interpretable neuron representations for NLP models, similar to those in computer vision, and found that gradient ascent with a gumbel softmax layer produces n-gram representations that outperform naive corpus search in terms of target neuron activation.
Input optimization methods, such as Google Deep Dream, create interpretable representations of neurons for computer vision DNNs. We propose and evaluate ways of transferring this technology to NLP. Our results suggest that gradient ascent with a gumbel softmax layer produces n-gram representations that outperform naive corpus search in terms of target neuron activation. The representations highlight differences in syntax awareness between the language and visual models of the Imaginet architecture.