Strongly-Typed Recurrent Neural Networks
This provides a principled approach to simplify RNN design for sequential learning tasks, though it appears incremental rather than paradigm-shifting.
The paper tackles the problem of overly complicated recurrent neural network architectures by introducing strongly-typed RNNs that incorporate type constraints from physics and functional programming principles. The result shows that these constrained architectures achieve lower training error and comparable generalization error to classical RNNs.
Recurrent neural networks are increasing popular models for sequential learning. Unfortunately, although the most effective RNN architectures are perhaps excessively complicated, extensive searches have not found simpler alternatives. This paper imports ideas from physics and functional programming into RNN design to provide guiding principles. From physics, we introduce type constraints, analogous to the constraints that forbids adding meters to seconds. From functional programming, we require that strongly-typed architectures factorize into stateless learnware and state-dependent firmware, reducing the impact of side-effects. The features learned by strongly-typed nets have a simple semantic interpretation via dynamic average-pooling on one-dimensional convolutions. We also show that strongly-typed gradients are better behaved than in classical architectures, and characterize the representational power of strongly-typed nets. Finally, experiments show that, despite being more constrained, strongly-typed architectures achieve lower training and comparable generalization error to classical architectures.