LGMLMay 29, 2019

Instant Quantization of Neural Networks using Monte Carlo Methods

arXiv:1905.12253v29 citations
Originality Incremental advance
AI Analysis

This addresses the need for low-power inference in neural networks, but it is incremental as it builds on existing quantization methods without retraining.

The authors tackled the problem of quantizing neural networks for efficient inference without retraining by proposing Monte Carlo Quantization (MCQ), which uses importance sampling to convert full-precision weights and activations into low bit-width integers, resulting in minimal accuracy loss and competitive performance on benchmarks.

Low bit-width integer weights and activations are very important for efficient inference, especially with respect to lower power consumption. We propose Monte Carlo methods to quantize the weights and activations of pre-trained neural networks without any re-training. By performing importance sampling we obtain quantized low bit-width integer values from full-precision weights and activations. The precision, sparsity, and complexity are easily configurable by the amount of sampling performed. Our approach, called Monte Carlo Quantization (MCQ), is linear in both time and space, with the resulting quantized, sparse networks showing minimal accuracy loss when compared to the original full-precision networks. Our method either outperforms or achieves competitive results on multiple benchmarks compared to previous quantization methods that do require additional training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes