Input-Output Equivalence of Unitary and Contractive RNNs
This addresses a foundational question in machine learning about the expressiveness of URNNs for researchers working on long-term dependencies, though it is incremental as it builds on prior work on gradient issues.
The paper tackled the problem of whether unitary recurrent neural networks (URNNs) are as expressive as general contractive RNNs, showing that with ReLU activations, URNNs can match any contractive RNN's input-output mapping using at most twice the hidden states, but this fails for certain smooth activations.
Unitary recurrent neural networks (URNNs) have been proposed as a method to overcome the vanishing and exploding gradient problem in modeling data with long-term dependencies. A basic question is how restrictive is the unitary constraint on the possible input-output mappings of such a network? This work shows that for any contractive RNN with ReLU activations, there is a URNN with at most twice the number of hidden states and the identical input-output mapping. Hence, with ReLU activations, URNNs are as expressive as general RNNs. In contrast, for certain smooth activations, it is shown that the input-output mapping of an RNN cannot be matched with a URNN, even with an arbitrary number of states. The theoretical results are supported by experiments on modeling of slowly-varying dynamical systems.