Word Interdependence Exposes How LSTMs Compose Representations
This work provides incremental insights into the internal mechanisms of LSTM language models, which is relevant for researchers in NLP and neural network interpretability.
The authors tackled the problem of understanding how LSTMs compose hierarchical representations by introducing a novel measure of word interdependence based on internal gate interactions. They demonstrated that high interdependence can harm generalization on synthetic data and found that interdependence correlates with syntactic linkage in English language data.
Recent work in NLP shows that LSTM language models capture compositional structure in language data. For a closer look at how these representations are composed hierarchically, we present a novel measure of interdependence between word meanings in an LSTM, based on their interactions at the internal gates. To explore how compositional representations arise over training, we conduct simple experiments on synthetic data, which illustrate our measure by showing how high interdependence can hurt generalization. These synthetic experiments also illustrate a specific hypothesis about how hierarchical structures are discovered over the course of training: that parent constituents rely on effective representations of their children, rather than on learning long-range relations independently. We further support this measure with experiments on English language data, where interdependence is higher for more closely syntactically linked word pairs.