Using Priming to Uncover the Organization of Syntactic Representations in Neural Language Models
This work addresses the interpretability of neural language models for researchers in computational linguistics and psycholinguistics, though it is incremental as it builds on existing priming paradigms.
The authors tackled the problem of understanding how neural language models represent syntactic structure by proposing a novel priming-based analysis technique, and they demonstrated that LSTM language models organize sentences with relative clauses hierarchically in a linguistically interpretable way.
Neural language models (LMs) perform well on tasks that require sensitivity to syntactic structure. Drawing on the syntactic priming paradigm from psycholinguistics, we propose a novel technique to analyze the representations that enable such success. By establishing a gradient similarity metric between structures, this technique allows us to reconstruct the organization of the LMs' syntactic representational space. We use this technique to demonstrate that LSTM LMs' representations of different types of sentences with relative clauses are organized hierarchically in a linguistically interpretable manner, suggesting that the LMs track abstract properties of the sentence.