ML AI LG OCJul 31, 2021

How much pre-training is enough to discover a good subnetwork?

Cameron R. Wolfe, Fangshuo Liao, Qihan Wang, Junhyung Lyle Kim, Anastasios Kyrillidis

arXiv:2108.00259v310.24 citations

Originality Incremental advance

AI Analysis

This work addresses the computational inefficiency of pruning for researchers and practitioners, but it is incremental as it builds on existing pruning methods with a theoretical extension.

The authors tackled the problem of determining the necessary amount of pre-training for neural network pruning to yield high-performing subnetworks, discovering a theoretical bound that shows the required pre-training iterations scale logarithmically with dataset size and validating this on MNIST with a multi-layer perceptron.

Neural network pruning is useful for discovering efficient, high-performing subnetworks within pre-trained, dense network architectures. More often than not, it involves a three-step process -- pre-training, pruning, and re-training -- that is computationally expensive, as the dense model must be fully pre-trained. While previous work has revealed through experiments the relationship between the amount of pre-training and the performance of the pruned network, a theoretical characterization of such dependency is still missing. Aiming to mathematically analyze the amount of dense network pre-training needed for a pruned network to perform well, we discover a simple theoretical bound in the number of gradient descent pre-training iterations on a two-layer, fully-connected network, beyond which pruning via greedy forward selection [61] yields a subnetwork that achieves good training error. Interestingly, this threshold is shown to be logarithmically dependent upon the size of the dataset, meaning that experiments with larger datasets require more pre-training for subnetworks obtained via pruning to perform well. Lastly, we empirically validate our theoretical results on a multi-layer perceptron trained on MNIST.

View on arXiv PDF

Similar