Alternative structures for character-level RNNs
This work addresses computational efficiency issues in character-level RNNs for language modeling, which is an incremental improvement for researchers and practitioners in natural language processing.
The authors tackled the problem of character-level RNNs suffering from high computational costs due to large hidden representations needed for long-term dependencies, by proposing two structural modifications: conditioning character representations on previous word representations and using character history to condition output probabilities, and evaluated them on multi-lingual real-world data.
Recurrent neural networks are convenient and efficient models for language modeling. However, when applied on the level of characters instead of words, they suffer from several problems. In order to successfully model long-term dependencies, the hidden representation needs to be large. This in turn implies higher computational costs, which can become prohibitive in practice. We propose two alternative structural modifications to the classical RNN model. The first one consists on conditioning the character level representation on the previous word representation. The other one uses the character history to condition the output probability. We evaluate the performance of the two proposed modifications on challenging, multi-lingual real world data.