Attribution Analysis of Grammatical Dependencies in LSTMs
This addresses a key interpretability issue in NLP for researchers and practitioners, but it is incremental as it builds on prior work on LSTM analysis.
The paper tackled the problem of whether LSTM language models capture grammatical dependencies like subject-verb agreement through spurious correlations or genuine understanding, and found that their performance correlates with distinguishing subjects from other nouns, suggesting robust syntactic representations.
LSTM language models have been shown to capture syntax-sensitive grammatical dependencies such as subject-verb agreement with a high degree of accuracy (Linzen et al., 2016, inter alia). However, questions remain regarding whether they do so using spurious correlations, or whether they are truly able to match verbs with their subjects. This paper argues for the latter hypothesis. Using layer-wise relevance propagation (Bach et al., 2015), a technique that quantifies the contributions of input features to model behavior, we show that LSTM performance on number agreement is directly correlated with the model's ability to distinguish subjects from other nouns. Our results suggest that LSTM language models are able to infer robust representations of syntactic dependencies.