LG AIMay 17

LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models

Mohammad Mozaffari, Younes Hourri, Mohammad Rastegari, Mahyar Najibi

arXiv:2605.1728987.0

Predicted impact top 10% in LG · last 90 daysOriginality Highly original

AI Analysis

For practitioners needing high-sparsity LLM pruning without accuracy loss, LEAP provides a practical end-to-end alternative to layer-wise methods.

LEAP introduces a tractable end-to-end unstructured pruning method for LLMs using a per-weight Bernoulli-via-Gumbel-sigmoid relaxation, achieving +2.59 average zero-shot accuracy improvement over ADMM at 50-60% sparsity across 0.5B-8B models.

Unstructured sparsity is now natively accelerated by recent GPU kernels and dataflow hardware, shifting the bottleneck from inference execution to the pruning algorithm. State-of-the-art methods for unstructured LLM pruning are layer-wise surrogates derived from the Optimal Brain Surgeon principle, and they sacrifice end-to-end accuracy, especially under aggressive sparsity. End-to-end alternatives such as MaskLLM and PATCH show that learnable masks can close this gap, but their categorical-over-patterns parameterization scales with the number of valid masks per row and does not port to the unstructured setting. We introduce LEAP, which replaces this intractable parameterization with a per-weight Bernoulli-via-Gumbel- sigmoid relaxation that makes end-to-end unstructured mask learning tractable. Across five LLM families from 0.5B to 8B parameters at 50% and 60% sparsity, LEAP improves six-task average zero-shot accuracy by +2.59 points on average over ADMM, the best layer-wise baseline in our sweep.

View on arXiv PDF

Similar