LGAIMLMar 6, 2018

Deep Neural Network Compression with Single and Multiple Level Quantization

arXiv:1803.03289v2124 citations
Originality Incremental advance
AI Analysis

This addresses the problem of deploying large neural networks in resource-constrained environments, offering an incremental improvement over existing quantization methods.

The paper tackles the problem of compressing deep neural networks by proposing two novel quantization approaches: single-level quantization (SLQ) for high-bit compression and multi-level quantization (MLQ) for extremely low-bit (ternary) compression, achieving impressive results on networks like AlexNet, VGG-16, GoogleNet, and ResNet-18.

Network quantization is an effective solution to compress deep neural networks for practical usage. Existing network quantization methods cannot sufficiently exploit the depth information to generate low-bit compressed network. In this paper, we propose two novel network quantization approaches, single-level network quantization (SLQ) for high-bit quantization and multi-level network quantization (MLQ) for extremely low-bit quantization (ternary).We are the first to consider the network quantization from both width and depth level. In the width level, parameters are divided into two parts: one for quantization and the other for re-training to eliminate the quantization loss. SLQ leverages the distribution of the parameters to improve the width level. In the depth level, we introduce incremental layer compensation to quantize layers iteratively which decreases the quantization loss in each iteration. The proposed approaches are validated with extensive experiments based on the state-of-the-art neural networks including AlexNet, VGG-16, GoogleNet and ResNet-18. Both SLQ and MLQ achieve impressive results.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes