CLLGMay 15, 2020

Movement Pruning: Adaptive Sparsity by Fine-Tuning

arXiv:2005.07683v2612 citations
AI Analysis

This addresses the need for efficient model compression in NLP applications, offering a more adaptive pruning method for fine-tuned models, though it is incremental as it builds on existing pruning techniques.

The paper tackled the problem of pruning large pretrained language models in transfer learning, where magnitude pruning is less effective, and proposed movement pruning, a deterministic first-order method that shows significant improvements in high-sparsity regimes, achieving minimal accuracy loss with only 3% of model parameters when combined with distillation.

Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning. We give mathematical foundations to the method and compare it to existing zeroth- and first-order pruning methods. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes. When combined with distillation, the approach achieves minimal accuracy loss with down to only 3% of the model parameters.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes