LGCVNov 30, 2016

Effective Quantization Methods for Recurrent Neural Networks

arXiv:1611.10176v181 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient deployment of RNNs for applications requiring reduced storage and faster computation, though it is incremental as it builds on existing quantization techniques.

The paper tackled the problem of performance degradation in quantized recurrent neural networks (RNNs) by proposing methods to quantize gates, interlinks, and weights in LSTM and GRU cells, achieving results that match or surpass previous state-of-the-art quantized RNNs on PTB and IMDB datasets.

Reducing bit-widths of weights, activations, and gradients of a Neural Network can shrink its storage size and memory usage, and also allow for faster training and inference by exploiting bitwise operations. However, previous attempts for quantization of RNNs show considerable performance degradation when using low bit-width weights and activations. In this paper, we propose methods to quantize the structure of gates and interlinks in LSTM and GRU cells. In addition, we propose balanced quantization methods for weights to further reduce performance degradation. Experiments on PTB and IMDB datasets confirm effectiveness of our methods as performances of our models match or surpass the previous state-of-the-art of quantized RNN.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes