CVJun 18, 2021

Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration

Qigong Sun, Xiufang Li, Fanhua Shang, Hongying Liu, Kang Yang, Licheng Jiao, Zhouchen Lin

arXiv:2106.09886v11.4Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of deploying neural networks on resource-constrained devices like mobile phones and embedded systems, offering a practical solution for industrial applications, though it appears incremental as it builds on existing quantization and binary network techniques.

The authors tackled the problem of deep neural networks requiring intensive resources for computation and storage, limiting their use on mobile and embedded devices, by proposing a {-1, +1} encoding scheme that decomposes quantized neural networks into multi-branch binary networks, achieving model compression, acceleration, and resource savings while maintaining performance comparable to high-bit methods on tasks like ImageNet classification.

The training of deep neural networks (DNNs) always requires intensive resources for both computation and data storage. Thus, DNNs cannot be efficiently applied to mobile phones and embedded devices, which severely limits their applicability in industrial applications. To address this issue, we propose a novel encoding scheme using {-1, +1} to decompose quantized neural networks (QNNs) into multi-branch binary networks, which can be efficiently implemented by bitwise operations (i.e., xnor and bitcount) to achieve model compression, computational acceleration, and resource saving. By using our method, users can achieve different encoding precisions arbitrarily according to their requirements and hardware resources. The proposed mechanism is highly suitable for the use of FPGA and ASIC in terms of data storage and computation, which provides a feasible idea for smart chips. We validate the effectiveness of our method on large-scale image classification (e.g., ImageNet), object detection, and semantic segmentation tasks. In particular, our method with low-bit encoding can still achieve almost the same performance as its high-bit counterparts.

View on arXiv PDF Code

Similar