Selection Plateau and a Sparsity-Dependent Hierarchy of Pruning Features
For researchers in neural network pruning, this work explains why diverse pruning methods cluster in performance and provides a principle for designing sparsity-adaptive selection algorithms.
The paper identifies a Selection Plateau in one-shot pruning where all rank-monotone weight scorers yield identical accuracy at fixed sparsity, and proposes the Sparsity-Information-Complexity Spectrum (SICS) hypothesis, which shows that escaping the plateau requires sparsity-dependent feature complexity. On ViT-Small/CIFAR-10, smooth non-monotone features achieve +6.6% escape at S=0.7, while raw features with high-frequency wiggle escape at S=0.8 (+2.6%).
We identify a Selection Plateau phenomenon in one-shot neural network pruning: all rank-monotone weight scorers converge to identical accuracy at fixed sparsity, independent of functional form. We propose the Sparsity-Information-Complexity Spectrum (SICS) hypothesis: a sparsity-dependent minimum feature complexity kappa(S) governs plateau escape, with kappa=0 sufficient at low sparsity (S<0.65), kappa=1 dominant at critical sparsity (S~0.7), and kappa=2 necessary at extreme sparsity (S>0.75). On ViT-Small/CIFAR-10, testing nine feature classes across four sparsities, smooth non-monotone features provide +6.6% escape at S=0.7, while only raw features with high-frequency wiggle escape at S=0.8 (+2.6%). A fake non-monotone scorer underperforms the gradient baseline, indicating the requirement is magnitude-independent non-monotonicity. A handcrafted Gaussian bump achieves only +0.006 escape vs. chaos-derived +0.046, indicating rank-alignment is necessary but insufficient. SICS provides a unifying explanation for the performance clustering of diverse pruning methods and suggests that future selection algorithms should adapt feature complexity to target sparsity.