CLAILGJan 29, 2025

2SSP: A Two-Stage Framework for Structured Pruning of LLMs

arXiv:2501.17771v26 citationsh-index: 4Has CodeTrans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This work addresses the challenge of reducing computational costs and improving efficiency for LLM deployment, though it appears incremental as it builds on existing pruning methods.

The paper tackles the problem of efficiently pruning large language models (LLMs) by proposing a two-stage framework (2SSP) that combines width and depth pruning strategies, resulting in consistent outperformance over state-of-the-art competitors with up to a two-order-of-magnitude gain in pruning time across various sparsity rates and tasks.

We propose a novel Two-Stage framework for Structured Pruning (\textsc{2SSP}) for pruning Large Language Models (LLMs), which combines two different strategies of pruning, namely Width and Depth Pruning. The first stage (Width Pruning) removes entire neurons, hence their corresponding rows and columns, aiming to preserve the connectivity among the pruned structures in the intermediate state of the Feed-Forward Networks in each Transformer block. This is done based on an importance score measuring the impact of each neuron on the output magnitude. The second stage (Depth Pruning), instead, removes entire Attention submodules. This is done by applying an iterative process that removes the Attention with the minimum impact on a given metric of interest (in our case, perplexity). We also propose a novel mechanism to balance the sparsity rate of the two stages w.r.t. to the desired global sparsity. We test \textsc{2SSP} on four LLM families and three sparsity rates (25\%, 37.5\%, and 50\%), measuring the resulting perplexity over three language modeling datasets as well as the performance over six downstream tasks. Our method consistently outperforms five state-of-the-art competitors over three language modeling and six downstream tasks, with an up to two-order-of-magnitude gain in terms of pruning time. The code is available at https://github.com/FabrizioSandri/2SSP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes