LGMLJun 12, 2020

Dynamic Model Pruning with Feedback

arXiv:2006.07253v1229 citations
Originality Highly original
AI Analysis

This addresses memory and latency issues for deploying models on resource-constrained devices, representing a novel approach rather than an incremental improvement.

The paper tackles the problem of deploying deep neural networks on low-end devices by proposing a model compression method that generates sparse models in a single training pass, achieving state-of-the-art performance on CIFAR-10 and ImageNet without retraining.

Deep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression method that generates a sparse trained model without additional overhead: by allowing (i) dynamic allocation of the sparsity pattern and (ii) incorporating feedback signal to reactivate prematurely pruned weights we obtain a performant sparse model in one single training pass (retraining is not needed, but can further improve the performance). We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models. Moreover, their performance surpasses that of models generated by all previously proposed pruning schemes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes