CLAILGOct 23, 2024

Beware of Calibration Data for Pruning Large Language Models

arXiv:2410.17711v29 citationsh-index: 11Has CodeICLR
Originality Incremental advance
AI Analysis

This work addresses a critical but overlooked issue in model compression for LLMs, offering practical improvements for efficient deployment, though it is incremental as it builds on existing pruning techniques.

The paper tackles the problem of calibration data selection for post-training pruning of large language models, finding that data similar to pre-training data improves performance, especially at high sparsity, and proposes a self-generating synthesis strategy that boosts existing pruning methods by up to 2.68%.

As large language models (LLMs) are widely applied across various fields, model compression has become increasingly crucial for reducing costs and improving inference efficiency. Post-training pruning is a promising method that does not require resource-intensive iterative training and only needs a small amount of calibration data to assess the importance of parameters. Recent research has enhanced post-training pruning from different aspects but few of them systematically explore the effects of calibration data, and it is unclear if there exist better calibration data construction strategies. We fill this blank and surprisingly observe that calibration data is also crucial to post-training pruning, especially for high sparsity. Through controlled experiments on important influence factors of calibration data, including the pruning settings, the amount of data, and its similarity with pre-training data, we observe that a small size of data is adequate, and more similar data to its pre-training stage can yield better performance. As pre-training data is usually inaccessible for advanced LLMs, we further provide a self-generating calibration data synthesis strategy to construct feasible calibration data. Experimental results on recent strong open-source LLMs (e.g., DCLM, and LLaMA-3) show that the proposed strategy can enhance the performance of strong pruning methods (e.g., Wanda, DSnoT, OWL) by a large margin (up to $2.68\%$). Code is available at https://github.com/Dereck0602/calibration_data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes