i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery
This addresses the problem of reducing neural network size for efficient deployment, offering a computationally efficient and provable method, though it appears incremental as it builds on existing pruning and sparse recovery ideas.
The paper tackles neural network pruning by proposing i-SpaSP, a structured pruning algorithm inspired by sparse signal recovery, which iteratively identifies important parameter groups and thresholds them based on a pruning ratio. It shows that the pruning error decays polynomially and improves pruning efficiency by several orders of magnitude over baselines across datasets like MNIST, ImageNet, and XNLI and architectures such as ResNet34 and BERT.
We propose a novel, structured pruning algorithm for neural networks -- the iterative, Sparse Structured Pruning algorithm, dubbed as i-SpaSP. Inspired by ideas from sparse signal recovery, i-SpaSP operates by iteratively identifying a larger set of important parameter groups (e.g., filters or neurons) within a network that contribute most to the residual between pruned and dense network output, then thresholding these groups based on a smaller, pre-defined pruning ratio. For both two-layer and multi-layer network architectures with ReLU activations, we show the error induced by pruning with i-SpaSP decays polynomially, where the degree of this polynomial becomes arbitrarily large based on the sparsity of the dense network's hidden representations. In our experiments, i-SpaSP is evaluated across a variety of datasets (i.e., MNIST, ImageNet, and XNLI) and architectures (i.e., feed forward networks, ResNet34, MobileNetV2, and BERT), where it is shown to discover high-performing sub-networks and improve upon the pruning efficiency of provable baseline methodologies by several orders of magnitude. Put simply, i-SpaSP is easy to implement with automatic differentiation, achieves strong empirical results, comes with theoretical convergence guarantees, and is efficient, thus distinguishing itself as one of the few computationally efficient, practical, and provable pruning algorithms.