LGAIMLAug 1, 2018

SlimNets: An Exploration of Deep Model Compression and Acceleration

arXiv:1808.00496v111 citations
Originality Synthesis-oriented
AI Analysis

This addresses the need for efficient deployment on resource-constrained devices like smartphones, but it is incremental as it combines existing methods.

The paper tackled the problem of deep neural networks being resource-intensive by evaluating and combining compression methods like pruning and knowledge distillation, resulting in a compressed network 85 times smaller while retaining 96% of the original accuracy on CIFAR10.

Deep neural networks have achieved increasingly accurate results on a wide variety of complex tasks. However, much of this improvement is due to the growing use and availability of computational resources (e.g use of GPUs, more layers, more parameters, etc). Most state-of-the-art deep networks, despite performing well, over-parameterize approximate functions and take a significant amount of time to train. With increased focus on deploying deep neural networks on resource constrained devices like smart phones, there has been a push to evaluate why these models are so resource hungry and how they can be made more efficient. This work evaluates and compares three distinct methods for deep model compression and acceleration: weight pruning, low rank factorization, and knowledge distillation. Comparisons on VGG nets trained on CIFAR10 show that each of the models on their own are effective, but that the true power lies in combining them. We show that by combining pruning and knowledge distillation methods we can create a compressed network 85 times smaller than the original, all while retaining 96% of the original model's accuracy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes