NELGJun 24, 2016

Sampling-based Gradient Regularization for Capturing Long-Term Dependencies in Recurrent Neural Networks

arXiv:1606.07767v35 citations
Originality Incremental advance
AI Analysis

This addresses a fundamental bottleneck in training RNNs for tasks requiring long-term memory, such as sequence modeling, but it is an incremental improvement over existing gradient regularization methods.

The paper tackles the vanishing/exploding gradient problem in recurrent neural networks by proposing a sampling-based gradient regularization technique that estimates each training example's contribution to the gradient norm, enabling effective training for long-term dependencies. The method allows detection of links in temporal sequences at ranges of approximately 100 or longer, as validated on synthetic benchmarks.

Vanishing (and exploding) gradients effect is a common problem for recurrent neural networks with nonlinear activation functions which use backpropagation method for calculation of derivatives. Deep feedforward neural networks with many hidden layers also suffer from this effect. In this paper we propose a novel universal technique that makes the norm of the gradient stay in the suitable range. We construct a way to estimate a contribution of each training example to the norm of the long-term components of the target function s gradient. Using this subroutine we can construct mini-batches for the stochastic gradient descent (SGD) training that leads to high performance and accuracy of the trained network even for very complex tasks. We provide a straightforward mathematical estimation of minibatch s impact on for the gradient norm and prove its correctness theoretically. To check our framework experimentally we use some special synthetic benchmarks for testing RNNs on ability to capture long-term dependencies. Our network can detect links between events in the (temporal) sequence at the range approx. 100 and longer.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes