Recurrent Neural Networks and Long Short-Term Memory Networks: Tutorial and Survey
It serves as an educational resource for researchers and practitioners in machine learning, but is incremental as it compiles existing knowledge without new contributions.
This tutorial paper explains the fundamentals and variants of Recurrent Neural Networks (RNNs) and Long Short-Term Memory Networks (LSTMs), addressing problems like gradient vanishing and explosion in sequence modeling.
This is a tutorial paper on Recurrent Neural Network (RNN), Long Short-Term Memory Network (LSTM), and their variants. We start with a dynamical system and backpropagation through time for RNN. Then, we discuss the problems of gradient vanishing and explosion in long-term dependencies. We explain close-to-identity weight matrix, long delays, leaky units, and echo state networks for solving this problem. Then, we introduce LSTM gates and cells, history and variants of LSTM, and Gated Recurrent Units (GRU). Finally, we introduce bidirectional RNN, bidirectional LSTM, and the Embeddings from Language Model (ELMo) network, for processing a sequence in both directions.