AIOct 8, 2023

Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models

arXiv:2310.05015v221 citationsh-index: 13
AI Analysis

This addresses the deployment challenges of LLMs for users with limited hardware resources, offering a novel pruning approach that is incremental over existing one-shot methods.

The paper tackles the problem of compressing large language models (LLMs) for deployment on resource-constrained hardware by introducing Compresso, a structured pruning method that uses collaborative prompting and LoRA-based regularization during training, resulting in pruning LLaMA-7B to 5.4B while maintaining or improving performance, such as a 2.62% increase in reading comprehension.

Despite the remarkable success of Large Language Models (LLMs), the massive size poses significant deployment challenges, particularly on resource-constrained hardware. While existing LLM compression methods focus on quantization, pruning remains relatively unexplored due to the high cost of training-based approaches and data collection challenges. One-shot pruning methods, although cost-effective and data-free, have become dominant in LLM pruning, but lead to performance decline under the structured pruning setting. In this work, we introduce a new paradigm for structurally pruning LLMs, called Compresso. Our approach, through the collaboration of the proposed resource-efficient pruning algorithm and the LLM itself, learns optimal pruning decisions during the training process. Compresso addresses the challenges of expensive training costs and data collection by incorporating Low-Rank Adaptation (LoRA) into the $L_0$ regularization during the instruction tuning process. Then, we further augment the pruning algorithm by introducing a collaborative prompt that fosters collaboration between the LLM and the pruning algorithm, significantly boosting the overall performance. To this end, Compresso prunes LLaMA-7B to 5.4B, maintaining original performance and even surpassing LLaMA-7B in reading comprehension by 2.62%. Extensive experiments demonstrate that Compresso significantly outperforms one-shot pruning baselines across various sparsity ratios, achieving up to 2.21%, 11.43%, 7.04%, and 4.81% higher scores on the commonsense reasoning, reading comprehension, MMLU, and BBH benchmarks, respectively.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes