LGFeb 23, 2018

Loss-aware Weight Quantization of Deep Networks

arXiv:1802.08635v2136 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of deploying deep networks on resource-constrained devices, offering an incremental improvement in quantization methods.

The paper tackles the problem of compressing deep networks for small computing devices by extending a loss-aware weight binarization scheme to ternarization and m-bit quantization, achieving results that outperform state-of-the-art weight quantization algorithms and match or exceed the accuracy of full-precision networks.

The huge size of deep networks hinders their use in small computing devices. In this paper, we consider compressing the network by weight quantization. We extend a recently proposed loss-aware weight binarization scheme to ternarization, with possibly different scaling parameters for the positive and negative weights, and m-bit (where m > 2) quantization. Experiments on feedforward and recurrent neural networks show that the proposed scheme outperforms state-of-the-art weight quantization algorithms, and is as accurate (or even more accurate) than the full-precision network.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes