CLAILGNov 2, 2018

On Evaluating the Generalization of LSTM Models in Formal Languages

arXiv:1811.01001v11108 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the uncertainty in language learning capabilities of RNNs for researchers in machine learning and formal language theory, but it is incremental as it focuses on empirical evaluation without introducing new methods.

The paper empirically evaluates the inductive learning capabilities of Long Short-Term Memory (LSTM) networks on simple formal languages like a^nb^n, a^nb^nc^n, and a^nb^nc^nd^n, finding striking differences in model performance under varying training data regimes and model capacities.

Recurrent Neural Networks (RNNs) are theoretically Turing-complete and established themselves as a dominant model for language processing. Yet, there still remains an uncertainty regarding their language learning capabilities. In this paper, we empirically evaluate the inductive learning capabilities of Long Short-Term Memory networks, a popular extension of simple RNNs, to learn simple formal languages, in particular $a^nb^n$, $a^nb^nc^n$, and $a^nb^nc^nd^n$. We investigate the influence of various aspects of learning, such as training data regimes and model capacity, on the generalization to unobserved samples. We find striking differences in model performances under different training settings and highlight the need for careful analysis and assessment when making claims about the learning capabilities of neural network models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes