Pruning Pre-trained Language Models with Principled Importance and Self-regularization
This work addresses the problem of model compression for NLP practitioners, offering an incremental improvement over existing pruning methods by introducing a more principled optimization approach and regularization technique.
The paper tackled the problem of compressing pre-trained language models via iterative pruning by formulating it as an equality-constrained 0-1 Integer Linear Programming problem, resulting in a principled importance criterion and a self-regularization scheme to improve generalization at high sparsity levels, with experiments showing effectiveness across multiple NLP tasks at various sparsity levels.
Iterative pruning is one of the most effective compression methods for pre-trained language models. We discovered that finding the optimal pruning decision is an equality-constrained 0-1 Integer Linear Programming problem. The solution to this optimization problem leads to a principled importance criterion which we use to rank parameters during iterative model pruning. To mitigate the poor generalization at high sparsity levels, we propose a self-regularization scheme where model prediction is regularized by the latest checkpoint with increasing sparsity throughout pruning. Our experiments on natural language understanding, question-answering, named entity recognition, and data-to-text generation with various Transformer-based PLMs show the effectiveness of the approach at various sparsity levels.