CLMay 29, 2018

LSTMs Exploit Linguistic Attributes of Data

arXiv:1805.11653v21103 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of understanding how data characteristics influence LSTM learning for researchers in NLP and machine learning, but it is incremental as it builds on existing knowledge without introducing new methods.

The study investigated how natural language data properties affect LSTMs' ability to recall elements from input sequences, finding that models trained on such data recall tokens from much longer sequences than those trained on non-language data, with LSTMs using specific neurons to count timesteps.

While recurrent neural networks have found success in a variety of natural language processing applications, they are general models of sequential data. We investigate how the properties of natural language data affect an LSTM's ability to learn a nonlinguistic task: recalling elements from its input. We find that models trained on natural language data are able to recall tokens from much longer sequences than models trained on non-language sequential data. Furthermore, we show that the LSTM learns to solve the memorization task by explicitly using a subset of its neurons to count timesteps in the input. We hypothesize that the patterns and structure in natural language data enable LSTMs to learn by providing approximate ways of reducing loss, but understanding the effect of different training data on the learnability of LSTMs remains an open question.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes