Progressive Weight Pruning of Deep Neural Networks using ADMM
This addresses model compression for DNNs on edge devices, offering a novel approach to achieve extremely high pruning ratios without accuracy degradation, though it is incremental in the context of existing pruning methods.
The paper tackles the problem of large model sizes in deep neural networks (DNNs) hindering edge computing by proposing a progressive weight pruning method using ADMM, achieving up to 34 times pruning on ImageNet and 167 times on MNIST with faster convergence and higher compression rates.
Deep neural networks (DNNs) although achieving human-level performance in many domains, have very large model size that hinders their broader applications on edge computing devices. Extensive research work have been conducted on DNN model compression or pruning. However, most of the previous work took heuristic approaches. This work proposes a progressive weight pruning approach based on ADMM (Alternating Direction Method of Multipliers), a powerful technique to deal with non-convex optimization problems with potentially combinatorial constraints. Motivated by dynamic programming, the proposed method reaches extremely high pruning rate by using partial prunings with moderate pruning rates. Therefore, it resolves the accuracy degradation and long convergence time problems when pursuing extremely high pruning ratios. It achieves up to 34 times pruning rate for ImageNet dataset and 167 times pruning rate for MNIST dataset, significantly higher than those reached by the literature work. Under the same number of epochs, the proposed method also achieves faster convergence and higher compression rates. The codes and pruned DNN models are released in the link bit.ly/2zxdlss