LGAIMLJan 26, 2022

Post-training Quantization for Neural Networks with Provable Guarantees

arXiv:2201.11113v356 citations
AI Analysis

This work addresses the challenge of deploying neural networks efficiently in hardware, offering a method with provable guarantees, though it appears incremental as it builds on prior GPFQ results.

The authors tackled the problem of implementing neural networks on resource-constrained hardware by generalizing a post-training quantization method, GPFQ, to handle various quantization alphabets and architectures, showing that quantizing models to few bits per weight on ImageNet results in only minor accuracy loss compared to unquantized models.

While neural networks have been remarkably successful in a wide array of applications, implementing them in resource-constrained hardware remains an area of intense research. By replacing the weights of a neural network with quantized (e.g., 4-bit, or binary) counterparts, massive savings in computation cost, memory, and power consumption are attained. To that end, we generalize a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism. Among other things, we propose modifications to promote sparsity of the weights, and rigorously analyze the associated error. Additionally, our error analysis expands the results of previous work on GPFQ to handle general quantization alphabets, showing that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights -- i.e., level of over-parametrization. Our result holds across a range of input distributions and for both fully-connected and convolutional architectures thereby also extending previous results. To empirically evaluate the method, we quantize several common architectures with few bits per weight, and test them on ImageNet, showing only minor loss of accuracy compared to unquantized models. We also demonstrate that standard modifications, such as bias correction and mixed precision quantization, further improve accuracy.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes