LG ARFeb 25

SigmaQuant: Hardware-Aware Heterogeneous Quantization Method for Edge DNN Inference

Qunyou Liu, Pengbo Yu, Marina Zapater, David Atienza

arXiv:2602.22136v12.71 citationsh-index: 3IEEE Transactions on Circuits and Systems for Artificial Intelligence

Originality Incremental advance

AI Analysis

This work addresses the challenge of optimizing DNN inference for edge devices with varying hardware conditions, representing an incremental improvement over existing heterogeneous quantization methods.

The paper tackles the problem of deploying deep neural networks on edge devices with resource constraints by introducing SigmaQuant, a heterogeneous quantization framework that adapts bitwidths per layer to balance accuracy and resource usage, achieving efficient deployment without exhaustive search.

Deep neural networks (DNNs) are essential for performing advanced tasks on edge or mobile devices, yet their deployment is often hindered by severe resource constraints, including limited memory, energy, and computational power. While uniform quantization provides a straightforward approach to compress model and reduce hardware requirement, it fails to fully leverage the varying robustness across layers, and often lead to accuracy degradation or suboptimal resource usage, particularly at low bitwidths. In contrast, heterogeneous quantization, which allocates different bitwidths to individual layers, can mitigate these drawbacks. Nonetheless, current heterogeneous quantization methods either needs huge brute-force design space search or lacks the adaptability to meet different hardware conditions, such as memory size, energy budget, and latency requirement. Filling these gaps, this work introduces \textbf{\textit{SigmaQuant}}, an adaptive layer-wise heterogeneous quantization framework designed to efficiently balance accuracy and resource usage for varied edge environments without exhaustive search.

View on arXiv PDF

Similar