LGAug 12, 2022

Orthogonal Gated Recurrent Unit with Neumann-Cayley Transformation

arXiv:2208.06496v17 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses stability and convergence issues in recurrent neural networks, particularly for tasks requiring long-term memory, though it appears incremental as it builds on existing orthogonal matrix approaches.

The authors tackled the exploding gradient problem in Gated Recurrent Units (GRUs) by integrating orthogonal matrices, proposing the Neumann-Cayley Orthogonal GRU (NC-GRU). Their experiments showed that NC-GRU significantly outperforms GRU and other RNNs on synthetic and real-world tasks.

In recent years, using orthogonal matrices has been shown to be a promising approach in improving Recurrent Neural Networks (RNNs) with training, stability, and convergence, particularly, to control gradients. While Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the usage of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and we propose a Neumann series-based Scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley Orthogonal GRU, or simply NC-GRU. We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU as well as several other RNNs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes