Neural Speed Reading via Skim-RNN
This addresses efficiency issues for users of RNNs in natural language processing, offering a drop-in replacement that reduces latency, though it is incremental as it builds on existing RNN frameworks.
The paper tackles the problem of high computational cost in recurrent neural networks (RNNs) by introducing Skim-RNN, which dynamically updates only a small fraction of the hidden state for unimportant tokens, achieving significantly reduced computational cost without losing accuracy across five natural language tasks.
Inspired by the principles of speed reading, we introduce Skim-RNN, a recurrent neural network (RNN) that dynamically decides to update only a small fraction of the hidden state for relatively unimportant input tokens. Skim-RNN gives computational advantage over an RNN that always updates the entire hidden state. Skim-RNN uses the same input and output interfaces as a standard RNN and can be easily used instead of RNNs in existing models. In our experiments, we show that Skim-RNN can achieve significantly reduced computational cost without losing accuracy compared to standard RNNs across five different natural language tasks. In addition, we demonstrate that the trade-off between accuracy and speed of Skim-RNN can be dynamically controlled during inference time in a stable manner. Our analysis also shows that Skim-RNN running on a single CPU offers lower latency compared to standard RNNs on GPUs.