LG CV MLJun 18, 2020

On the Predictability of Pruning Across Scales

Jonathan S. Rosenfeld, Jonathan Frankle, Michael Carbin, Nir Shavit

arXiv:2006.10621v318.647 citations

Originality Incremental advance

AI Analysis

This provides a framework for reasoning about unstructured pruning, which is important for reducing the cost of training large neural networks, though it is incremental as it builds on existing pruning methods.

The authors tackled the problem of predicting the error of iteratively magnitude-pruned neural networks, showing that it follows a scaling law with interpretable coefficients that generalize across depths, widths, dataset sizes, and densities, including on large-scale data like ImageNet and architectures like ResNets.

We show that the error of iteratively magnitude-pruned networks empirically follows a scaling law with interpretable coefficients that depend on the architecture and task. We functionally approximate the error of the pruned networks, showing it is predictable in terms of an invariant tying width, depth, and pruning level, such that networks of vastly different pruned densities are interchangeable. We demonstrate the accuracy of this approximation over orders of magnitude in depth, width, dataset size, and density. We show that the functional form holds (generalizes) for large scale data (e.g., ImageNet) and architectures (e.g., ResNets). As neural networks become ever larger and costlier to train, our findings suggest a framework for reasoning conceptually and analytically about a standard method for unstructured pruning.

View on arXiv PDF

Similar