State Gradients for RNN Memory Analysis
This work provides a method for analyzing RNN memory, which is an incremental improvement for researchers in interpretable machine learning and natural language processing.
The authors tackled the problem of understanding what information RNNs retain in their hidden states by introducing a gradient-based framework that decomposes state gradients via SVD to analyze memory retention of input embeddings. They applied this to LSTM language models, quantifying how long and to what extent word classes are remembered on average for a corpus.
We present a framework for analyzing what the state in RNNs remembers from its input embeddings. Our approach is inspired by backpropagation, in the sense that we compute the gradients of the states with respect to the input embeddings. The gradient matrix is decomposed with Singular Value Decomposition to analyze which directions in the embedding space are best transferred to the hidden state space, characterized by the largest singular values. We apply our approach to LSTM language models and investigate to what extent and for how long certain classes of words are remembered on average for a certain corpus. Additionally, the extent to which a specific property or relationship is remembered by the RNN can be tracked by comparing a vector characterizing that property with the direction(s) in embedding space that are best preserved in hidden state space.