Improve Convolutional Neural Network Pruning by Maximizing Filter Variety
This work addresses the issue of suboptimal filter selection in neural network pruning for researchers and practitioners, offering an incremental improvement by appending to existing pruning criteria.
The paper tackles the problem of structured pruning in convolutional neural networks by introducing a technique that maximizes filter variety to avoid removing rare, discriminative filters and retaining redundant ones, achieving similar sparsity levels with higher performance across datasets like CIFAR-10, CIFAR-100, and CALTECH-101 using architectures such as VGG-16 and ResNet-18.
Neural network pruning is a widely used strategy for reducing model storage and computing requirements. It allows to lower the complexity of the network by introducing sparsity in the weights. Because taking advantage of sparse matrices is still challenging, pruning is often performed in a structured way, i.e. removing entire convolution filters in the case of ConvNets, according to a chosen pruning criteria. Common pruning criteria, such as l1-norm or movement, usually do not consider the individual utility of filters, which may lead to: (1) the removal of filters exhibiting rare, thus important and discriminative behaviour, and (2) the retaining of filters with redundant information. In this paper, we present a technique solving those two issues, and which can be appended to any pruning criteria. This technique ensures that the criteria of selection focuses on redundant filters, while retaining the rare ones, thus maximizing the variety of remaining filters. The experimental results, carried out on different datasets (CIFAR-10, CIFAR-100 and CALTECH-101) and using different architectures (VGG-16 and ResNet-18) demonstrate that it is possible to achieve similar sparsity levels while maintaining a higher performance when appending our filter selection technique to pruning criteria. Moreover, we assess the quality of the found sparse sub-networks by applying the Lottery Ticket Hypothesis and find that the addition of our method allows to discover better performing tickets in most cases