NESep 8, 2014

Recurrent Neural Network Regularization

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals

arXiv:1409.2329v53002 citations

Originality Incremental advance

AI Analysis

This addresses the issue of poor regularization in RNNs and LSTMs for researchers and practitioners in machine learning, enabling better performance in sequence-based tasks.

The paper tackled the problem of overfitting in Recurrent Neural Networks (RNNs) with LSTM units by introducing a simple regularization technique that correctly applies dropout, resulting in substantial reductions in overfitting across tasks like language modeling, speech recognition, image caption generation, and machine translation.

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper, we show how to correctly apply dropout to LSTMs, and show that it substantially reduces overfitting on a variety of tasks. These tasks include language modeling, speech recognition, image caption generation, and machine translation.

View on arXiv PDF

Similar