CL AIMay 27, 2025

DLP: Dynamic Layerwise Pruning in Large Language Models

Yuli Chen, Bo Cheng, Jiale Han, Yingying Zhang, Yingting Li, Shuhao Zhang

arXiv:2505.23807v34 citationsh-index: 9Has CodeICML

Originality Highly original

AI Analysis

This addresses the efficiency and performance trade-off in LLM pruning for AI practitioners, offering an incremental improvement over existing non-uniform pruning methods.

The paper tackles the problem of performance degradation in large language models at high sparsity levels by proposing Dynamic Layerwise Pruning (DLP), which adaptively assigns pruning rates based on layer importance, resulting in a 7.79 perplexity reduction and 2.7% accuracy improvement for LLaMA2-7B at 70% sparsity compared to state-of-the-art methods.

Pruning has recently been widely adopted to reduce the parameter scale and improve the inference efficiency of Large Language Models (LLMs). Mainstream pruning techniques often rely on uniform layerwise pruning strategies, which can lead to severe performance degradation at high sparsity levels. Recognizing the varying contributions of different layers in LLMs, recent studies have shifted their focus toward non-uniform layerwise pruning. However, these approaches often rely on pre-defined values, which can result in suboptimal performance. To overcome these limitations, we propose a novel method called Dynamic Layerwise Pruning (DLP). This approach adaptively determines the relative importance of each layer by integrating model weights with input activation information, assigning pruning rates accordingly. Experimental results show that DLP effectively preserves model performance at high sparsity levels across multiple LLMs. Specifically, at 70% sparsity, DLP reduces the perplexity of LLaMA2-7B by 7.79 and improves the average accuracy by 2.7% compared to state-of-the-art methods. Moreover, DLP is compatible with various existing LLM compression techniques and can be seamlessly integrated into Parameter-Efficient Fine-Tuning (PEFT). We release the code at https://github.com/ironartisan/DLP to facilitate future research.

View on arXiv PDF Code

Similar