LGJul 6, 2023

Pruning vs Quantization: Which is Better?

arXiv:2307.02973v2128 citationsh-index: 26
Originality Synthesis-oriented
AI Analysis

This work addresses a foundational question in neural network compression to inform hardware design decisions, though it is incremental as it builds on existing techniques.

The paper tackles the problem of comparing neural network pruning and quantization for compression, providing an extensive analytical and experimental analysis across multiple models and tasks, with results showing that quantization generally outperforms pruning except at very high compression ratios.

Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models on 3 tasks. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with very high compression ratio, pruning might be beneficial from an accuracy standpoint.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes