LGMLNov 18, 2019

Eigenvalue Normalized Recurrent Neural Networks for Short Term Memory

arXiv:1911.07964v18 citations
Originality Incremental advance
AI Analysis

This addresses a problem in sequence modeling for machine learning practitioners by offering an incremental improvement over existing RNN variants.

The paper tackles the issue of orthogonal/unitary RNNs retaining all input information by proposing an architecture with eigenvalues in the unit disc to simulate short-term memory, showing high competitiveness in experiments.

Several variants of recurrent neural networks (RNNs) with orthogonal or unitary recurrent matrices have recently been developed to mitigate the vanishing/exploding gradient problem and to model long-term dependencies of sequences. However, with the eigenvalues of the recurrent matrix on the unit circle, the recurrent state retains all input information which may unnecessarily consume model capacity. In this paper, we address this issue by proposing an architecture that expands upon an orthogonal/unitary RNN with a state that is generated by a recurrent matrix with eigenvalues in the unit disc. Any input to this state dissipates in time and is replaced with new inputs, simulating short-term memory. A gradient descent algorithm is derived for learning such a recurrent matrix. The resulting method, called the Eigenvalue Normalized RNN (ENRNN), is shown to be highly competitive in several experiments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes