CLAug 7, 2023

Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models

arXiv:2308.03449v214 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the need for efficient model compression in NLP, reducing costs for deploying large language models, though it is incremental as it builds on existing pruning techniques.

The paper tackles the problem of compressing pretrained encoder-based language models without retraining, which often leads to accuracy loss, and achieves up to 58.02% higher F1 score compared to existing methods at 80% compression on SQuAD.

Given a pretrained encoder-based language model, how can we accurately compress it without retraining? Retraining-free structured pruning algorithms are crucial in pretrained language model compression due to their significantly reduced pruning cost and capability to prune large language models. However, existing retraining-free algorithms encounter severe accuracy degradation, as they fail to handle pruning errors, especially at high compression rates. In this paper, we propose K-prune (Knowledge-preserving pruning), an accurate retraining-free structured pruning algorithm for pretrained encoder-based language models. K-prune focuses on preserving the useful knowledge of the pretrained model to minimize pruning errors through a carefully designed iterative pruning process composed of knowledge measurement, knowledge-preserving mask search, and knowledge-preserving weight-tuning. As a result, K-prune shows significant accuracy improvements up to 58.02%p higher F1 score compared to existing retraining-free pruning algorithms under a high compression rate of 80% on the SQuAD benchmark without any retraining process.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes