LG MLFeb 25, 2019

The State of Sparsity in Deep Neural Networks

arXiv:1902.09574v144.3885 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work provides large-scale benchmarks for model compression, revealing inconsistencies in existing techniques and establishing baselines for future research.

The authors evaluated three state-of-the-art sparsity techniques on Transformer and ResNet-50 models, finding that simple magnitude pruning often outperforms complex methods and that pruned architectures cannot match the performance of models trained with joint sparsification and optimization.

We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet. Across thousands of experiments, we demonstrate that complex techniques (Molchanov et al., 2017; Louizos et al., 2017b) shown to yield high compression rates on smaller datasets perform inconsistently, and that simple magnitude pruning approaches achieve comparable or better results. Additionally, we replicate the experiments performed by (Frankle & Carbin, 2018) and (Liu et al., 2018) at scale and show that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization. Together, these results highlight the need for large-scale benchmarks in the field of model compression. We open-source our code, top performing model checkpoints, and results of all hyperparameter configurations to establish rigorous baselines for future work on compression and sparsification.

View on arXiv PDF Code

Similar