CLAILGMar 30, 2023

oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

arXiv:2303.17612v3222 citationsh-index: 24
Originality Incremental advance
AI Analysis

This work provides NLP practitioners with easy-to-use compressed models for faster inference, though it is incremental as it builds on existing compression techniques.

The paper tackles the problem of making language models faster without sacrificing accuracy by introducing oBERTa, which improves sparse transfer learning through better initialization, distillation, and pruning, achieving up to 24.3 times faster inference while matching or exceeding the performance of models like BERTbase and Prune OFA Large on tasks such as SQUAD V1.1.

In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models which allows Natural Language Processing (NLP) practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression. Specifically, oBERTa extends existing work on pruning, knowledge distillation, and quantization and leverages frozen embeddings improves distillation and model initialization to deliver higher accuracy on a broad range of transfer tasks. In generating oBERTa, we explore how the highly optimized RoBERTa differs from the BERT for pruning during pre-training and finetuning. We find it less amenable to compression during fine-tuning. We explore the use of oBERTa on seven representative NLP tasks and find that the improved compression techniques allow a pruned oBERTa model to match the performance of BERTbase and exceed the performance of Prune OFA Large on the SQUAD V1.1 Question Answering dataset, despite being 8x and 2x, respectively faster in inference. We release our code, training regimes, and associated model for broad usage to encourage usage and experimentation

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes