LGMay 31, 2021

Learning and Generalization in RNNs

arXiv:2106.00047v13 citations
Originality Incremental advance
AI Analysis

This addresses a foundational gap in machine learning theory for sequence modeling, though it is incremental as it builds on prior work.

The paper tackles the lack of theoretical understanding for RNNs by proving they can learn general functions of sequences, not just sums of individual tokens, and demonstrates this on regular language recognition problems.

Simple recurrent neural networks (RNNs) and their more advanced cousins LSTMs etc. have been very successful in sequence modeling. Their theoretical understanding, however, is lacking and has not kept pace with the progress for feedforward networks, where a reasonably complete understanding in the special case of highly overparametrized one-hidden-layer networks has emerged. In this paper, we make progress towards remedying this situation by proving that RNNs can learn functions of sequences. In contrast to the previous work that could only deal with functions of sequences that are sums of functions of individual tokens in the sequence, we allow general functions. Conceptually and technically, we introduce new ideas which enable us to extract information from the hidden state of the RNN in our proofs -- addressing a crucial weakness in previous work. We illustrate our results on some regular language recognition problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes