CL LGOct 15, 2024

DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models

Shangqian Gao, Chi-Heng Lin, Ting Hua, Tang Zheng, Yilin Shen, Hongxia Jin, Yen-Chang Hsu

arXiv:2410.11988v212.927 citationsh-index: 20Has CodeNIPS

Originality Incremental advance

AI Analysis

This work addresses deployment challenges for LLMs on resource-limited devices, offering a more flexible pruning approach that is incremental over prior structural methods.

The paper tackles the problem of reducing memory and computational costs for deploying large language models (LLMs) on resource-limited devices by proposing a dimension-independent structural pruning method that relaxes constraints and removes structural dependence along the embedding dimension, achieving accuracy similar to semi-structural pruning in evaluations on models like OPT, LLaMA, and Phi-2.

Large Language Models (LLMs) have achieved remarkable success in various natural language processing tasks, including language modeling, understanding, and generation. However, the increased memory and computational costs associated with these models pose significant challenges for deployment on resource-limited devices. Structural pruning has emerged as a promising solution to reduce the costs of LLMs without requiring post-processing steps. Prior structural pruning methods either follow the dependence of structures at the cost of limiting flexibility, or introduce non-trivial additional parameters by incorporating different projection matrices. In this work, we propose a novel approach that relaxes the constraint imposed by regular structural pruning methods and eliminates the structural dependence along the embedding dimension. Our dimension-independent structural pruning method offers several benefits. Firstly, our method enables different blocks to utilize different subsets of the feature maps. Secondly, by removing structural dependence, we facilitate each block to possess varying widths along its input and output dimensions, thereby significantly enhancing the flexibility of structural pruning. We evaluate our method on various LLMs, including OPT, LLaMA, LLaMA-2, Phi-1.5, and Phi-2. Experimental results demonstrate that our approach outperforms other state-of-the-art methods, showing for the first time that structural pruning can achieve an accuracy similar to semi-structural pruning.

View on arXiv PDF Code

Similar