CVDec 20, 2022

CSMPQ:Class Separability Based Mixed-Precision Quantization

arXiv:2212.10220v11 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the need for efficient inference in deep learning with a novel approach that improves compression trade-offs, though it appears incremental as it builds on existing mixed-precision quantization methods.

The paper tackles the problem of reducing computational burden in neural networks by proposing CSMPQ, a mixed-precision quantization method that uses class separability to assign bits without iterative search, achieving 73.03% Top-1 accuracy on ResNet-18 with 59G BOPs and 71.30% on MobileNetV2 with 1.5Mb.

Mixed-precision quantization has received increasing attention for its capability of reducing the computational burden and speeding up the inference time. Existing methods usually focus on the sensitivity of different network layers, which requires a time-consuming search or training process. To this end, a novel mixed-precision quantization method, termed CSMPQ, is proposed. Specifically, the TF-IDF metric that is widely used in natural language processing (NLP) is introduced to measure the class separability of layer-wise feature maps. Furthermore, a linear programming problem is designed to derive the optimal bit configuration for each layer. Without any iterative process, the proposed CSMPQ achieves better compression trade-offs than the state-of-the-art quantization methods. Specifically, CSMPQ achieves 73.03$\%$ Top-1 acc on ResNet-18 with only 59G BOPs for QAT, and 71.30$\%$ top-1 acc with only 1.5Mb on MobileNetV2 for PTQ.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes