Visualizing and Understanding Neural Models in NLP
This work addresses the interpretability challenge for researchers and practitioners in NLP, though it is incremental as it adapts visualization techniques from computer vision.
The paper tackles the problem of interpreting neural models in NLP by developing four visualization strategies to understand compositionality, such as negation and intensification, and tests them on sentiment analysis with simple recurrent nets and LSTMs, showing how these methods reveal markedness asymmetries and differences between model types.
While neural networks have been successfully applied to many NLP tasks the resulting vector-based models are very difficult to interpret. For example it's not clear how they achieve {\em compositionality}, building sentence meaning from the meanings of words and phrases. In this paper we describe four strategies for visualizing compositionality in neural models for NLP, inspired by similar work in computer vision. We first plot unit values to visualize compositionality of negation, intensification, and concessive clauses, allow us to see well-known markedness asymmetries in negation. We then introduce three simple and straightforward methods for visualizing a unit's {\em salience}, the amount it contributes to the final composed meaning: (1) gradient back-propagation, (2) the variance of a token from the average word node, (3) LSTM-style gates that measure information flow. We test our methods on sentiment using simple recurrent nets and LSTMs. Our general-purpose methods may have wide applications for understanding compositionality and other semantic properties of deep networks , and also shed light on why LSTMs outperform simple recurrent nets,