LG CV MLDec 12, 2024

DQA: An Efficient Method for Deep Quantization of Deep Neural Network Activations

arXiv:2412.09687v12.6h-index: 2

Originality Incremental advance

AI Analysis

This addresses the need for practical, low-bit quantization on devices with limited compute and memory, though it appears incremental as it builds on existing quantization techniques.

The paper tackles the problem of efficiently quantizing deep neural network activations to sub-6-bit levels for resource-constrained devices, proposing DQA which uses shifting-based operations and Huffman coding to achieve up to 29.28% better accuracy compared to existing methods.

Quantization of Deep Neural Network (DNN) activations is a commonly used technique to reduce compute and memory demands during DNN inference, which can be particularly beneficial on resource-constrained devices. To achieve high accuracy, existing methods for quantizing activations rely on complex mathematical computations or perform extensive searches for the best hyper-parameters. However, these expensive operations are impractical on devices with limited computation capabilities, memory capacities, and energy budgets. Furthermore, many existing methods do not focus on sub-6-bit (or deep) quantization. To fill these gaps, in this paper we propose DQA (Deep Quantization of DNN Activations), a new method that focuses on sub-6-bit quantization of activations and leverages simple shifting-based operations and Huffman coding to be efficient and achieve high accuracy. We evaluate DQA with 3, 4, and 5-bit quantization levels and three different DNN models for two different tasks, image classification and image segmentation, on two different datasets. DQA shows significantly better accuracy (up to 29.28%) compared to the direct quantization method and the state-of-the-art NoisyQuant for sub-6-bit quantization.

View on arXiv PDF

Similar