CLSep 3, 2021

Finetuned Language Models Are Zero-Shot Learners

Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le

arXiv:2109.01652v545.95263 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the challenge of adapting large language models to unseen tasks without task-specific training, offering a simple yet effective method for improving zero-shot capabilities in NLP.

The paper tackles the problem of zero-shot learning in language models by introducing instruction tuning, which finetunes a 137B parameter model on over 60 NLP tasks using natural language instructions, resulting in substantial performance gains, such as surpassing zero-shot GPT-3 on 20 out of 25 tasks and outperforming few-shot GPT-3 on several benchmarks.

This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natural language instruction templates. We evaluate this instruction-tuned model, which we call FLAN, on unseen task types. FLAN substantially improves the performance of its unmodified counterpart and surpasses zero-shot 175B GPT-3 on 20 of 25 tasks that we evaluate. FLAN even outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze. Ablation studies reveal that number of finetuning datasets, model scale, and natural language instructions are key to the success of instruction tuning.

View on arXiv PDF Code

Similar