LGARFeb 25

SigmaQuant: Hardware-Aware Heterogeneous Quantization Method for Edge DNN Inference

arXiv:2602.22136v11 citationsh-index: 18IEEE Transactions on Circuits and Systems for Artificial Intelligence
AI Analysis

This work addresses the challenge of optimizing DNN inference for edge devices with varying hardware conditions, representing an incremental improvement over existing heterogeneous quantization methods.

The paper tackles the problem of deploying deep neural networks on edge devices with resource constraints by introducing SigmaQuant, a heterogeneous quantization framework that adapts bitwidths per layer to balance accuracy and resource usage, achieving efficient deployment without exhaustive search.

Deep neural networks (DNNs) are essential for performing advanced tasks on edge or mobile devices, yet their deployment is often hindered by severe resource constraints, including limited memory, energy, and computational power. While uniform quantization provides a straightforward approach to compress model and reduce hardware requirement, it fails to fully leverage the varying robustness across layers, and often lead to accuracy degradation or suboptimal resource usage, particularly at low bitwidths. In contrast, heterogeneous quantization, which allocates different bitwidths to individual layers, can mitigate these drawbacks. Nonetheless, current heterogeneous quantization methods either needs huge brute-force design space search or lacks the adaptability to meet different hardware conditions, such as memory size, energy budget, and latency requirement. Filling these gaps, this work introduces \textbf{\textit{SigmaQuant}}, an adaptive layer-wise heterogeneous quantization framework designed to efficiently balance accuracy and resource usage for varied edge environments without exhaustive search.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes