CLLGJan 22, 2024

APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

arXiv:2401.12200v232 citationsh-index: 82ICML
Originality Incremental advance
AI Analysis

This addresses efficiency challenges for users of large language models, offering a method that balances performance and resource use, though it is incremental as it builds on existing pruning and tuning techniques.

The paper tackles the problem of expensive fine-tuning and inference in large language models by introducing APT, which adaptively prunes and tunes parameters to improve both training and inference efficiency, achieving up to 98% task performance with 40% parameters pruned and speeding up fine-tuning by up to 8x.

Fine-tuning and inference with large Language Models (LM) are generally known to be expensive. Parameter-efficient fine-tuning over pretrained LMs reduces training memory by updating a small number of LM parameters but does not improve inference efficiency. Structured pruning improves LM inference efficiency by removing consistent parameter blocks, yet often increases training memory and time. To improve both training and inference efficiency, we introduce APT that adaptively prunes and tunes parameters for the LMs. At the early stage of fine-tuning, APT dynamically adds salient tuning parameters for fast and accurate convergence while discarding unimportant parameters for efficiency. Compared to baselines, our experiments show that APT maintains up to 98% task performance when pruning RoBERTa and T5 models with 40% parameters left while keeping 86.4% LLaMA models' performance with 70% parameters remained. Furthermore, APT speeds up LMs fine-tuning by up to 8x and reduces large LMs memory training footprint by up to 70%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes