The Propensity for Density in Feed-forward Models
This addresses the efficiency and interpretability of neural network training for researchers and practitioners, though it appears incremental as it builds on existing pruning methods.
The study investigated whether neural networks use all available weights during training, finding that the proportion of prunable weights without performance loss is largely invariant to model size, with substantial prunability observed across models up to 50 times wider.
Does the process of training a neural network to solve a task tend to use all of the available weights even when the task could be solved with fewer weights? To address this question we study the effects of pruning fully connected, convolutional and residual models while varying their widths. We find that the proportion of weights that can be pruned without degrading performance is largely invariant to model size. Increasing the width of a model has little effect on the density of the pruned model relative to the increase in absolute size of the pruned network. In particular, we find substantial prunability across a large range of model sizes, where our biggest model is 50 times as wide as our smallest model. We explore three hypotheses that could explain these findings.