LGNEMLJul 15, 2020

Compression strategies and space-conscious representations for deep neural networks

arXiv:2007.07967v112 citations
AI Analysis

This addresses memory efficiency for deploying deep learning models on devices with limited RAM, but it is incremental as it builds on existing compression techniques.

The paper tackles the problem of deploying large convolutional neural networks on resource-limited platforms by investigating compression strategies, achieving compression rates up to 165 times while preserving or improving model performance.

Recent advances in deep learning have made available large, powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications. Unfortunately, these large-sized models have millions of parameters, thus they are not deployable on resource-limited platforms (e.g. where RAM is limited). Compression of CNNs thereby becomes a critical problem to achieve memory-efficient and possibly computationally faster model representations. In this paper, we investigate the impact of lossy compression of CNNs by weight pruning and quantization, and lossless weight matrix representations based on source coding. We tested several combinations of these techniques on four benchmark datasets for classification and regression problems, achieving compression rates up to $165$ times, while preserving or improving the model performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes