A probabilistic framework for dynamic quantization
This work addresses efficient quantization for neural networks, offering a domain-specific improvement in computer vision applications.
The authors tackled the problem of dynamic quantization in neural networks by proposing a probabilistic framework that adapts quantization parameters per input with minimal computational overhead, achieving negligible performance loss on computer vision tasks.
We propose a probabilistic framework for dynamic quantization of neural networks that allows for a computationally efficient input-adaptive rescaling of the quantization parameters. Our framework applies a probabilistic model to the network's pre-activations through a lightweight surrogate, enabling the adaptive adjustment of the quantization parameters on a per-input basis without significant memory overhead. We validate our approach on a set of popular computer vision tasks and models, observing only a negligible loss in performance. Our method strikes the best performance and computational overhead tradeoff compared to standard quantization strategies.