CLOct 21, 2022

Performance-Efficiency Trade-Offs in Adapting Language Models to Text Classification Tasks

arXiv:2210.12022v1296 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses efficiency challenges for practitioners deploying language models in real-world applications, though it is incremental as it compares existing adaptation methods.

The paper tackles the problem of adapting pre-trained language models to text classification tasks while balancing performance and efficiency, finding that prompting combined with knowledge distillation can reduce compute and data costs simultaneously.

Pre-trained language models (LMs) obtain state-of-the-art performance when adapted to text classification tasks. However, when using such models in real-world applications, efficiency considerations are paramount. In this paper, we study how different training procedures that adapt LMs to text classification perform, as we vary model and train set size. More specifically, we compare standard fine-tuning, prompting, and knowledge distillation (KD) when the teacher was trained with either fine-tuning or prompting. Our findings suggest that even though fine-tuning and prompting work well to train large LMs on large train sets, there are more efficient alternatives that can reduce compute or data cost. Interestingly, we find that prompting combined with KD can reduce compute and data cost at the same time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes