UnICORNN: A recurrent model for learning very long time dependencies
This addresses the exploding and vanishing gradient problem in RNNs for applications requiring long-term memory, such as time-series analysis or natural language processing, representing a novel method rather than an incremental improvement.
The paper tackled the challenge of processing sequential inputs with long-time dependencies in recurrent neural networks (RNNs) by proposing a novel architecture based on discretizing a Hamiltonian system of oscillators, resulting in state-of-the-art performance on tasks with very long-time dependencies.
The design of recurrent neural networks (RNNs) to accurately process sequential inputs with long-time dependencies is very challenging on account of the exploding and vanishing gradient problem. To overcome this, we propose a novel RNN architecture which is based on a structure preserving discretization of a Hamiltonian system of second-order ordinary differential equations that models networks of oscillators. The resulting RNN is fast, invertible (in time), memory efficient and we derive rigorous bounds on the hidden state gradients to prove the mitigation of the exploding and vanishing gradient problem. A suite of experiments are presented to demonstrate that the proposed RNN provides state of the art performance on a variety of learning tasks with (very) long-time dependencies.