Intra-layer Nonuniform Quantization for Deep Convolutional Neural Network
This work addresses memory efficiency for deploying DCNNs in resource-constrained hardware or software, representing an incremental improvement over existing quantization techniques.
The authors tackled the high memory requirements of deep convolutional neural networks by proposing two nonuniform quantization schemes, ENQ and KNQ, which reduced memory storage by about 50% for VGG-16 and AlexNet while maintaining or improving classification accuracy compared to state-of-the-art methods.
Deep convolutional neural network (DCNN) has achieved remarkable performance on object detection and speech recognition in recent years. However, the excellent performance of a DCNN incurs high computational complexity and large memory requirement. In this paper, an equal distance nonuniform quantization (ENQ) scheme and a K-means clustering nonuniform quantization (KNQ) scheme are proposed to reduce the required memory storage when low complexity hardware or software implementations are considered. For the VGG-16 and the AlexNet, the proposed nonuniform quantization schemes reduce the number of required memory storage by approximately 50\% while achieving almost the same or even better classification accuracy compared to the state-of-the-art quantization method. Compared to the ENQ scheme, the proposed KNQ scheme provides a better tradeoff when higher accuracy is required.