LGITMLJun 14, 2020

Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient

arXiv:2006.07990v2116 citations
Originality Highly original
AI Analysis

This addresses a theoretical gap in neural network pruning for researchers, providing a more efficient and near-optimal bound that aligns better with experimental findings.

The paper tackles the problem of reducing the over-parameterization needed to approximate target neural networks via pruning under the strong lottery ticket hypothesis, showing that a logarithmic factor in width suffices, which is an exponential improvement over prior polynomial requirements and is optimal for constant depth networks.

The strong {\it lottery ticket hypothesis} (LTH) postulates that one can approximate any target neural network by only pruning the weights of a sufficiently over-parameterized random network. A recent work by Malach et al. \cite{MalachEtAl20} establishes the first theoretical analysis for the strong LTH: one can provably approximate a neural network of width $d$ and depth $l$, by pruning a random one that is a factor $O(d^4l^2)$ wider and twice as deep. This polynomial over-parameterization requirement is at odds with recent experimental research that achieves good approximation with networks that are a small factor wider than the target. In this work, we close the gap and offer an exponential improvement to the over-parameterization requirement for the existence of lottery tickets. We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(\log(dl))$ wider and twice as deep. Our analysis heavily relies on connecting pruning random ReLU networks to random instances of the \textsc{SubsetSum} problem. We then show that this logarithmic over-parameterization is essentially optimal for constant depth networks. Finally, we verify several of our theoretical insights with experiments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes