Importance Estimation with Random Gradient for Neural Network Pruning
This work addresses efficiency in neural network pruning for AI practitioners, but it is incremental as it builds on existing TaylorFO approximation methods.
The authors tackled the problem of neural network pruning by proposing a method that estimates neuron importance using random gradients and normalization, eliminating the need for labeled data. Their approach outperformed previous methods on ResNet and VGG architectures with CIFAR-100 and STL-10 datasets, and it enhanced existing methods when combined.
Global Neuron Importance Estimation is used to prune neural networks for efficiency reasons. To determine the global importance of each neuron or convolutional kernel, most of the existing methods either use activation or gradient information or both, which demands abundant labelled examples. In this work, we use heuristics to derive importance estimation similar to Taylor First Order (TaylorFO) approximation based methods. We name our methods TaylorFO-abs and TaylorFO-sq. We propose two additional methods to improve these importance estimation methods. Firstly, we propagate random gradients from the last layer of a network, thus avoiding the need for labelled examples. Secondly, we normalize the gradient magnitude of the last layer output before propagating, which allows all examples to contribute similarly to the importance score. Our methods with additional techniques perform better than previous methods when tested on ResNet and VGG architectures on CIFAR-100 and STL-10 datasets. Furthermore, our method also complements the existing methods and improves their performances when combined with them.