LGSep 30, 2025

Effective Model Pruning

Yixuan Wang, Dan Guralnik, Saiedeh Akbari, Warren Dixon

arXiv:2509.25606v11 citationsh-index: 7

Originality Incremental advance

AI Analysis

This addresses the challenge of model compression for AI practitioners by providing a parameter-free rule that works across different models and criteria, though it is incremental as it builds on existing pruning methods.

The paper tackles the problem of determining how many parameters to keep in model pruning by introducing Effective Model Pruning (EMP), a universal adaptive threshold that can be applied to any pruning criterion, resulting in sparse models with performance comparable to dense networks across various architectures like MLPs, CNNs, Transformers/LLMs, and KAN.

We introduce Effective Model Pruning (EMP), a context-agnostic, parameter-free rule addressing a fundamental question about pruning: how many entries to keep. EMP does not prescribe how to score the parameters or prune the models; instead, it supplies a universal adaptive threshold that can be applied to any pruning criterion: weight magnitude, attention score, KAN importance score, or even feature-level signals such as image pixel, and used on structural parts or weights of the models. Given any score vector s, EMP maps s to a built-in effective number N_eff which is inspired by the Inverse Simpson index of contributors. Retaining the N_eff highest scoring entries and zeroing the remainder yields sparse models with performance comparable to the original dense networks across MLPs, CNNs, Transformers/LLMs, and KAN, in our experiments. By leveraging the geometry of the simplex, we derive a tight lower bound on the preserved mass s_eff (the sum of retained scores) over the corresponding ordered probability simplex associated with the score vector s. We further verify the effectiveness of N_eff by pruning the model with a scaled threshold \b{eta}*N_eff across a variety of criteria and models. Experiments suggest that the default \b{eta} = 1 yields a robust threshold for model pruning while \b{eta} not equal to 1 still serves as an optional adjustment to meet specific sparsity requirements.

View on arXiv PDF

Similar