LGAIITMar 3, 2023

Rotation Invariant Quantization for Model Compression

arXiv:2303.03106v31 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of deploying large models on resource-constrained devices, representing an incremental improvement in quantization methods.

The paper tackles the problem of compressing neural network models for deployment on memory-limited devices by proposing a Rotation-Invariant Quantization (RIQ) technique, achieving compression ratios of up to 52.9x with less than 0.4% accuracy degradation.

Post-training Neural Network (NN) model compression is an attractive approach for deploying large, memory-consuming models on devices with limited memory resources. In this study, we investigate the rate-distortion tradeoff for NN model compression. First, we suggest a Rotation-Invariant Quantization (RIQ) technique that utilizes a single parameter to quantize the entire NN model, yielding a different rate at each layer, i.e., mixed-precision quantization. Then, we prove that our rotation-invariant approach is optimal in terms of compression. We rigorously evaluate RIQ and demonstrate its capabilities on various models and tasks. For example, RIQ facilitates $\times 19.4$ and $\times 52.9$ compression ratios on pre-trained VGG dense and pruned models, respectively, with $<0.4\%$ accuracy degradation. Code is available in \href{https://github.com/ehaleva/RIQ}{github.com/ehaleva/RIQ}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes