CLMar 7, 2023

Larger language models do in-context learning differently

Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma

DeepMind

arXiv:2303.03846v231.4483 citationsh-index: 46

Originality Incremental advance

AI Analysis

This work addresses the fundamental understanding of in-context learning mechanisms in language models, which is crucial for AI researchers and practitioners, though it is incremental in building on prior studies of emergent abilities.

The paper investigates how language models perform in-context learning by analyzing the interplay between semantic priors and input-label mappings, finding that larger models can override semantic priors with contradictory exemplars and handle semantically-unrelated labels, with these abilities emerging primarily with scale and instruction tuning enhancing both aspects.

We study how in-context learning (ICL) in language models is affected by semantic priors versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with semantically-unrelated labels-across various model families (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). First, experiments on ICL with flipped labels show that overriding semantic priors is an emergent ability of model scale. While small language models ignore flipped labels presented in-context and thus rely primarily on semantic priors from pretraining, large models can override semantic priors when presented with in-context exemplars that contradict priors, despite the stronger semantic priors that larger models may hold. We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e.g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task. The ability to do SUL-ICL also emerges primarily with scale, and large-enough language models can even perform linear classification in a SUL-ICL setting. Finally, we evaluate instruction-tuned models and find that instruction tuning strengthens both the use of semantic priors and the capacity to learn input-label mappings, but more of the former.

View on arXiv PDF

Similar