CVSep 6, 2025

Sensitivity-Aware Post-Training Quantization for Deep Neural Networks

arXiv:2509.05576v12 citationsh-index: 8PRCV
Originality Incremental advance
AI Analysis

This work addresses the computational complexity and resource overhead in quantization for resource-constrained edge computing and real-time inference, offering an incremental improvement over existing methods.

The paper tackled the problem of accuracy degradation in post-training quantization for deep neural networks by proposing a sensitivity-aware method that prioritizes high-sensitivity parameters and uses low-sensitivity ones to compensate for errors, achieving a 20-200-fold speedup in quantization with less than 0.3% mean accuracy loss on models like ResNet-50 and YOLOv5s.

Model quantization reduces neural network parameter precision to achieve compression, but often compromises accuracy. Existing post-training quantization (PTQ) methods employ iterative parameter updates to preserve accuracy under high compression ratios, incurring significant computational complexity and resource overhead, which limits applicability in resource-constrained edge computing and real-time inference scenarios. This paper proposes an efficient PTQ method guided by parameter sensitivity analysis. The approach prioritizes quantization of high-sensitivity parameters, leveraging unquantized low-sensitivity parameters to compensate for quantization errors, thereby mitigating accuracy degradation. Furthermore, by exploiting column-wise clustering of parameter sensitivity, the method introduces a row-parallel quantization framework with a globally shared inverse Hessian matrix update mechanism, reducing computational complexity by an order of magnitude. Experimental results on ResNet-50 and YOLOv5s demonstrate a 20-200-fold quantization speedup over the Optimal Brain Quantization baseline, with mean accuracy loss below 0.3%, confirming the method's efficacy in balancing efficiency and accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes