LGDec 13, 2016

DizzyRNN: Reparameterizing Recurrent Neural Networks for Norm-Preserving Backpropagation

Victor Dorobantu, Per Andre Stromhaug, Jess Renteria

arXiv:1612.04035v16.225 citations

Originality Incremental advance

AI Analysis

This work addresses a fundamental issue in training recurrent neural networks for long-term dependencies, offering a novel solution that could benefit applications in sequence modeling, though it appears incremental relative to existing orthogonal methods.

The authors tackled the vanishing and exploding gradient problems in recurrent neural networks by proposing a reparameterization using Givens rotations and absolute value non-linearity to ensure norm-preserving backpropagation. They demonstrated that this approach reduces parameters and outperforms standard RNNs with orthogonal initializations and LSTM networks on the copy problem.

The vanishing and exploding gradient problems are well-studied obstacles that make it difficult for recurrent neural networks to learn long-term time dependencies. We propose a reparameterization of standard recurrent neural networks to update linear transformations in a provably norm-preserving way through Givens rotations. Additionally, we use the absolute value function as an element-wise non-linearity to preserve the norm of backpropagated signals over the entire network. We show that this reparameterization reduces the number of parameters and maintains the same algorithmic complexity as a standard recurrent neural network, while outperforming standard recurrent neural networks with orthogonal initializations and Long Short-Term Memory networks on the copy problem.

View on arXiv PDF

Similar