LGAISep 12, 2017

RRA: Recurrent Residual Attention for Sequence Learning

arXiv:1709.03714v19 citations
Originality Incremental advance
AI Analysis

This work addresses sequence learning challenges for AI applications, but it is incremental as it builds on existing RNN and attention methods.

The paper tackled learning long-range dependencies in sequential data by proposing a recurrent neural network with residual attention (RRA), which improved performance, convergence speed, and training stability compared to standard LSTM, showing competitive results on tasks like the adding problem, MNIST classification, and IMDB sentiment analysis.

In this paper, we propose a recurrent neural network (RNN) with residual attention (RRA) to learn long-range dependencies from sequential data. We propose to add residual connections across timesteps to RNN, which explicitly enhances the interaction between current state and hidden states that are several timesteps apart. This also allows training errors to be directly back-propagated through residual connections and effectively alleviates gradient vanishing problem. We further reformulate an attention mechanism over residual connections. An attention gate is defined to summarize the individual contribution from multiple previous hidden states in computing the current state. We evaluate RRA on three tasks: the adding problem, pixel-by-pixel MNIST classification and sentiment analysis on the IMDB dataset. Our experiments demonstrate that RRA yields better performance, faster convergence and more stable training compared to a standard LSTM network. Furthermore, RRA shows highly competitive performance to the state-of-the-art methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes