Layer-wise Model Pruning based on Mutual Information
This work addresses the need for efficient neural network compression, offering a method that improves speed and accuracy for deployment in resource-constrained environments, though it appears incremental as it builds on existing pruning techniques.
The paper tackles the problem of model pruning by proposing a layer-wise strategy based on mutual information, which avoids irregular memory access and uses a top-down approach for better global perspective, resulting in greater speedup and higher performance at the same sparsity level compared to weight-based methods like magnitude pruning.
The proposed pruning strategy offers merits over weight-based pruning techniques: (1) it avoids irregular memory access since representations and matrices can be squeezed into their smaller but dense counterparts, leading to greater speedup; (2) in a manner of top-down pruning, the proposed method operates from a more global perspective based on training signals in the top layer, and prunes each layer by propagating the effect of global signals through layers, leading to better performances at the same sparsity level. Extensive experiments show that at the same sparsity level, the proposed strategy offers both greater speedup and higher performances than weight-based pruning methods (e.g., magnitude pruning, movement pruning).