LG MLJun 12, 2020

Dynamic Model Pruning with Feedback

Tao Lin, Sebastian U. Stich, Luis Barba, Daniil Dmitriev, Martin Jaggi

arXiv:2006.07253v132.3229 citations

Originality Highly original

AI Analysis

This addresses memory and latency issues for deploying models on resource-constrained devices, representing a novel approach rather than an incremental improvement.

The paper tackles the problem of deploying deep neural networks on low-end devices by proposing a model compression method that generates sparse models in a single training pass, achieving state-of-the-art performance on CIFAR-10 and ImageNet without retraining.

Deep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression method that generates a sparse trained model without additional overhead: by allowing (i) dynamic allocation of the sparsity pattern and (ii) incorporating feedback signal to reactivate prematurely pruned weights we obtain a performant sparse model in one single training pass (retraining is not needed, but can further improve the performance). We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models. Moreover, their performance surpasses that of models generated by all previously proposed pruning schemes.

View on arXiv PDF

Similar