An Overview of Datatype Quantization Techniques for Convolutional Neural Networks
This is an incremental overview of methods to enable CNN deployment on mobile and embedded devices.
The paper tackles the problem of high hardware demands of Convolutional Neural Networks (CNNs) on low-power devices by describing and comparing quantization techniques that reduce floating-point weights and activations to maintain performance.
Convolutional Neural Networks (CNNs) are becoming increasingly popular due to their superior performance in the domain of computer vision, in applications such as objection detection and recognition. However, they demand complex, power-consuming hardware which makes them unsuitable for implementation on low-power mobile and embedded devices. In this paper, a description and comparison of various techniques is presented which aim to mitigate this problem. This is primarily achieved by quantizing the floating-point weights and activations to reduce the hardware requirements, and adapting the training and inference algorithms to maintain the network's performance.