CL AI LG MLAug 18, 2024

Out-of-distribution generalization via composition: a lens through induction heads in Transformers

arXiv:2408.09503v212.627 citationsh-index: 5Has Code

Originality Incremental advance

AI Analysis

This addresses the open question of OOD generalization in LLMs, which is crucial for their ability to solve novel tasks without fine-tuning, though it appears incremental by focusing on specific mechanisms like induction heads.

The paper tackles the problem of how large language models achieve out-of-distribution generalization by inferring hidden rules from prompts, finding that models can learn rules through composition of self-attention layers, specifically via induction heads, and that a shared latent subspace facilitates this process.

Large language models (LLMs) such as GPT-4 sometimes appear to be creative, solving novel tasks often with a few demonstrations in the prompt. These tasks require the models to generalize on distributions different from those from training data -- which is known as out-of-distribution (OOD) generalization. Despite the tremendous success of LLMs, how they approach OOD generalization remains an open and underexplored question. We examine OOD generalization in settings where instances are generated according to hidden rules, including in-context learning with symbolic reasoning. Models are required to infer the hidden rules behind input prompts without any fine-tuning. We empirically examined the training dynamics of Transformers on a synthetic example and conducted extensive experiments on a variety of pretrained LLMs, focusing on a type of components known as induction heads. We found that OOD generalization and composition are tied together -- models can learn rules by composing two self-attention layers, thereby achieving OOD generalization. Furthermore, a shared latent subspace in the embedding (or feature) space acts as a bridge for composition by aligning early layers and later layers, which we refer to as the common bridge representation hypothesis.

View on arXiv PDF Code

Similar