LG NE MLOct 29, 2018

Counting in Language with RNNs

Heng xin Fun, Sergiy V Bokhnyak, Francesco Saverio Zuppichini

arXiv:1810.12411v20.8

Originality Synthesis-oriented

AI Analysis

This addresses a specific performance gap in RNN architectures for NLP tasks, but is incremental as it builds on existing knowledge about LSTM and GRU mechanisms.

The paper investigates why LSTMs outperform GRUs in language modeling and machine translation, attributing it to better counting abilities, and demonstrates this through analysis on simplified languages like Context-Free and Context-Sensitive Languages.

In this paper we examine a possible reason for the LSTM outperforming the GRU on language modeling and more specifically machine translation. We hypothesize that this has to do with counting. This is a consistent theme across the literature of long term dependence, counting, and language modeling for RNNs. Using the simplified forms of language -- Context-Free and Context-Sensitive Languages -- we show how exactly the LSTM performs its counting based on their cell states during inference and why the GRU cannot perform as well.

View on arXiv PDF

Similar