Projection-Free CNN Pruning via Frank-Wolfe with Momentum: Sparser Models with Less Pretraining
This work addresses efficient model compression for CNNs in image tasks, offering a method to reduce pre-training needs, but it is incremental as it builds on existing pruning and optimization techniques.
The paper tackles pruning convolutional neural networks for image classification by comparing Frank-Wolfe optimization methods, finding that FW with momentum produces sparser and more accurate models than baselines, achieving this with only a few epochs of pre-training instead of full training.
We investigate algorithmic variants of the Frank-Wolfe (FW) optimization method for pruning convolutional neural networks. This is motivated by the "Lottery Ticket Hypothesis", which suggests the existence of smaller sub-networks within larger pre-trained networks that perform comparatively well (if not better). Whilst most literature in this area focuses on Deep Neural Networks more generally, we specifically consider Convolutional Neural Networks for image classification tasks. Building on the hypothesis, we compare simple magnitude-based pruning, a Frank-Wolfe style pruning scheme, and an FW method with momentum on a CNN trained on MNIST. Our experiments track test accuracy, loss, sparsity, and inference time as we vary the dense pre-training budget from 1 to 10 epochs. We find that FW with momentum yields pruned networks that are both sparser and more accurate than the original dense model and the simple pruning baselines, while incurring minimal inference-time overhead in our implementation. Moreover, FW with momentum reaches these accuracies after only a few epochs of pre-training, indicating that full pre-training of the dense model is not required in this setting.