MLLGJul 29, 2017

Orthogonal Recurrent Neural Networks with Scaled Cayley Transform

arXiv:1707.09520v3150 citations
AI Analysis

This addresses gradient issues in RNNs for sequential data tasks, offering a simpler alternative to unitary RNNs, though it is incremental as it builds on existing orthogonal methods.

The paper tackled the problem of vanishing or exploding gradients in RNNs by proposing a scaled Cayley orthogonal RNN (scoRNN) that maintains orthogonal weight matrices without complex values, achieving superior results with fewer parameters in experiments.

Recurrent Neural Networks (RNNs) are designed to handle sequential data but suffer from vanishing or exploding gradients. Recent work on Unitary Recurrent Neural Networks (uRNNs) have been used to address this issue and in some cases, exceed the capabilities of Long Short-Term Memory networks (LSTMs). We propose a simpler and novel update scheme to maintain orthogonal recurrent weight matrices without using complex valued matrices. This is done by parametrizing with a skew-symmetric matrix using the Cayley transform. Such a parametrization is unable to represent matrices with negative one eigenvalues, but this limitation is overcome by scaling the recurrent weight matrix by a diagonal matrix consisting of ones and negative ones. The proposed training scheme involves a straightforward gradient calculation and update step. In several experiments, the proposed scaled Cayley orthogonal recurrent neural network (scoRNN) achieves superior results with fewer trainable parameters than other unitary RNNs.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes