LGNEMLOct 29, 2018

Counting in Language with RNNs

arXiv:1810.12411v2
Originality Synthesis-oriented
AI Analysis

This addresses a specific performance gap in RNN architectures for NLP tasks, but is incremental as it builds on existing knowledge about LSTM and GRU mechanisms.

The paper investigates why LSTMs outperform GRUs in language modeling and machine translation, attributing it to better counting abilities, and demonstrates this through analysis on simplified languages like Context-Free and Context-Sensitive Languages.

In this paper we examine a possible reason for the LSTM outperforming the GRU on language modeling and more specifically machine translation. We hypothesize that this has to do with counting. This is a consistent theme across the literature of long term dependence, counting, and language modeling for RNNs. Using the simplified forms of language -- Context-Free and Context-Sensitive Languages -- we show how exactly the LSTM performs its counting based on their cell states during inference and why the GRU cannot perform as well.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes