CLLGJan 26, 2021

Muppet: Massive Multi-task Representations with Pre-Finetuning

arXiv:2101.11038v1729 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of enhancing task-specific adaptation for NLP practitioners, though it is incremental as it builds on existing pre-training and fine-tuning paradigms.

The paper tackles the problem of improving generalization and sample efficiency in language models by introducing pre-finetuning, a large-scale multi-task learning stage between pre-training and fine-tuning, which consistently boosts performance across various tasks and models, with critical improvements observed when using over 15 tasks.

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g.~RoBERTa) and generation models (e.g.~BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes