Quantization of Deep Neural Networks for Accumulator-constrained Processors
This work addresses the challenge of efficient model deployment on resource-constrained embedded platforms, representing an incremental improvement in quantization techniques.
The paper tackles the problem of deploying deep neural networks on embedded processors with limited accumulator registers by introducing a quantization methodology that maximizes bit width for inputs and weights, achieving classification accuracy within 1% of floating-point baselines on CIFAR-10 and ILSVRC2012 benchmarks and a near-optimal 2x speedup on an ARM processor.
We introduce an Artificial Neural Network (ANN) quantization methodology for platforms without wide accumulation registers. This enables fixed-point model deployment on embedded compute platforms that are not specifically designed for large kernel computations (i.e. accumulator-constrained processors). We formulate the quantization problem as a function of accumulator size, and aim to maximize the model accuracy by maximizing bit width of input data and weights. To reduce the number of configurations to consider, only solutions that fully utilize the available accumulator bits are being tested. We demonstrate that 16-bit accumulators are able to obtain a classification accuracy within 1\% of the floating-point baselines on the CIFAR-10 and ILSVRC2012 image classification benchmarks. Additionally, a near-optimal $2\times$ speedup is obtained on an ARM processor, by exploiting 16-bit accumulators for image classification on the All-CNN-C and AlexNet networks.