LGAICVNEMar 22, 2024

Magic for the Age of Quantized DNNs

arXiv:2403.14999v1h-index: 10
Originality Incremental advance
AI Analysis

This work addresses model compression for product integration, offering an incremental improvement in quantization techniques for DNNs.

The paper tackles the challenge of deploying large DNNs on small-scale computers by proposing a quantization-aware training method called MaQD, which introduces Layer-Batch Normalization and uses scaled round-clip functions for weight and activation quantization, achieving minimal accuracy degradation in experiments.

Recently, the number of parameters in DNNs has explosively increased, as exemplified by LLMs (Large Language Models), making inference on small-scale computers more difficult. Model compression technology is, therefore, essential for integration into products. In this paper, we propose a method of quantization-aware training. We introduce a novel normalization (Layer-Batch Normalization) that is independent of the mini-batch size and does not require any additional computation cost during inference. Then, we quantize the weights by the scaled round-clip function with the weight standardization. We also quantize activation functions using the same function and apply surrogate gradients to train the model with both quantized weights and the quantized activation functions. We call this method Magic for the age of Quantised DNNs (MaQD). Experimental results show that our quantization method can be achieved with minimal accuracy degradation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes