CLSep 3, 2021

Finetuned Language Models Are Zero-Shot Learners

arXiv:2109.01652v55087 citationsHas Code
Originality Highly original
AI Analysis

This work addresses the challenge of adapting large language models to unseen tasks without task-specific training, offering a simple yet effective method for improving zero-shot capabilities in NLP.

The paper tackles the problem of zero-shot learning in language models by introducing instruction tuning, which finetunes a 137B parameter model on over 60 NLP tasks using natural language instructions, resulting in substantial performance gains, such as surpassing zero-shot GPT-3 on 20 out of 25 tasks and outperforming few-shot GPT-3 on several benchmarks.

This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natural language instruction templates. We evaluate this instruction-tuned model, which we call FLAN, on unseen task types. FLAN substantially improves the performance of its unmodified counterpart and surpasses zero-shot 175B GPT-3 on 20 of 25 tasks that we evaluate. FLAN even outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze. Ablation studies reveal that number of finetuning datasets, model scale, and natural language instructions are key to the success of instruction tuning.

Code Implementations8 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes