CVLGNov 25, 2019

Gating Revisited: Deep Multi-layer RNNs That Can Be Trained

arXiv:1911.11033v467 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of building deeper and more efficient RNNs for sequence modeling, representing an incremental improvement over existing gated cells like LSTM and GRU.

The authors tackled the problem of training deep multi-layer RNNs by proposing a new gated recurrent cell (STAR) that reduces parameters and mitigates vanishing/exploding gradients, leading to improved performance and computational efficiency on sequence modeling tasks.

We propose a new STAckable Recurrent cell (STAR) for recurrent neural networks (RNNs), which has fewer parameters than widely used LSTM and GRU while being more robust against vanishing or exploding gradients. Stacking recurrent units into deep architectures suffers from two major limitations: (i) many recurrent cells (e.g., LSTMs) are costly in terms of parameters and computation resources; and (ii) deep RNNs are prone to vanishing or exploding gradients during training. We investigate the training of multi-layer RNNs and examine the magnitude of the gradients as they propagate through the network in the "vertical" direction. We show that, depending on the structure of the basic recurrent unit, the gradients are systematically attenuated or amplified. Based on our analysis we design a new type of gated cell that better preserves gradient magnitude. We validate our design on a large number of sequence modelling tasks and demonstrate that the proposed STAR cell allows to build and train deeper recurrent architectures, ultimately leading to improved performance while being computationally more efficient.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes