LGCLDec 22, 2022

Training Integer-Only Deep Recurrent Neural Networks

arXiv:2212.11791v14 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient RNN deployment for text and speech applications on resource-constrained edge devices, representing an incremental advance in quantization methods.

The paper tackles the problem of deploying recurrent neural networks (RNNs) on edge devices by developing a quantization-aware training method for integer-only RNNs (iRNNs), achieving a 2x improvement in runtime and 4x reduction in model size while maintaining similar accuracy to full-precision models.

Recurrent neural networks (RNN) are the backbone of many text and speech applications. These architectures are typically made up of several computationally complex components such as; non-linear activation functions, normalization, bi-directional dependence and attention. In order to maintain good accuracy, these components are frequently run using full-precision floating-point computation, making them slow, inefficient and difficult to deploy on edge devices. In addition, the complex nature of these operations makes them challenging to quantize using standard quantization methods without a significant performance drop. We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN). Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions, to serve a wide range of state-of-the-art RNNs. The proposed method enables RNN-based language models to run on edge devices with $2\times$ improvement in runtime, and $4\times$ reduction in model size while maintaining similar accuracy as its full-precision counterpart.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes