CLFeb 28, 2024

SparseLLM: Towards Global Pruning for Pre-trained Language Models

arXiv:2402.17946v439 citationsh-index: 10NIPS
Originality Incremental advance
AI Analysis

This addresses the problem of high computational costs for users of large language models, offering an incremental improvement over existing pruning methods.

The paper tackles the computational inefficiency of large language models by proposing SparseLLM, a global pruning framework that redefines the process into manageable subproblems, achieving significant performance improvements and surpassing state-of-the-art methods in high-sparsity regimes.

The transformative impact of large language models (LLMs) like LLaMA and GPT on natural language processing is countered by their prohibitive computational demands. Pruning has emerged as a pivotal compression strategy, introducing sparsity to enhance both memory and computational efficiency. Yet, traditional global pruning is impractical for LLMs due to scalability issues, while local pruning, despite its efficiency, leads to suboptimal solutions. Addressing these challenges, we propose SparseLLM, a novel framework that redefines the global pruning process into manageable, coordinated subproblems, allowing for resource-efficient optimization with global optimality. SparseLLM's approach, which conceptualizes LLMs as a chain of modular functions and leverages auxiliary variables for problem decomposition, not only facilitates a pragmatic application on LLMs but also demonstrates significant performance improvements, particularly in high-sparsity regimes where it surpasses current state-of-the-art methods.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes