Restricted Recurrent Neural Tensor Networks: Exploiting Word Frequency and Compositionality
This work addresses efficiency challenges in language modeling for NLP practitioners, offering a method to enhance performance with reduced resource usage, though it is incremental as it builds on existing RNTN architectures.
The paper tackles the problem of high computational and memory costs in recurrent neural networks by introducing restricted recurrent neural tensor networks (r-RNTN), which use distinct weights for frequent words and shared weights for infrequent ones, resulting in improved language model performance with fewer parameters, as shown by perplexity evaluations.
Increasing the capacity of recurrent neural networks (RNN) usually involves augmenting the size of the hidden layer, with significant increase of computational cost. Recurrent neural tensor networks (RNTN) increase capacity using distinct hidden layer weights for each word, but with greater costs in memory usage. In this paper, we introduce restricted recurrent neural tensor networks (r-RNTN) which reserve distinct hidden layer weights for frequent vocabulary words while sharing a single set of weights for infrequent words. Perplexity evaluations show that for fixed hidden layer sizes, r-RNTNs improve language model performance over RNNs using only a small fraction of the parameters of unrestricted RNTNs. These results hold for r-RNTNs using Gated Recurrent Units and Long Short-Term Memory.