NELGDec 24, 2014

Learning Longer Memory in Recurrent Neural Networks

arXiv:1412.7753v288 citations
Originality Incremental advance
AI Analysis

This addresses the vanishing gradient issue for researchers and practitioners in sequence modeling, offering a simpler alternative to complex architectures like LSTM.

The paper tackled the problem of training recurrent neural networks for longer-term patterns by introducing a structural modification that encourages slow state changes, achieving similar performance to LSTM networks in language modeling experiments.

Recurrent neural network is a powerful model that learns temporal patterns in sequential data. For a long time, it was believed that recurrent networks are difficult to train using simple optimizers, such as stochastic gradient descent, due to the so-called vanishing gradient problem. In this paper, we show that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent. This is achieved by using a slight structural modification of the simple recurrent neural network architecture. We encourage some of the hidden units to change their state slowly by making part of the recurrent weight matrix close to identity, thus forming kind of a longer term memory. We evaluate our model in language modeling experiments, where we obtain similar performance to the much more complex Long Short Term Memory (LSTM) networks (Hochreiter & Schmidhuber, 1997).

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes