LGAICLFeb 19, 2025

Which Attention Heads Matter for In-Context Learning?

arXiv:2502.14010v152 citationsh-index: 3ICML
Originality Incremental advance
AI Analysis

This work addresses the mechanism of in-context learning for AI researchers, providing insights into model interpretability, though it is incremental as it builds on existing theories.

The study investigated whether induction heads or function vector (FV) heads primarily drive in-context learning in large language models, finding that FV heads are key, especially in larger models, and that many FV heads evolve from induction heads during training.

Large language models (LLMs) exhibit impressive in-context learning (ICL) capability, enabling them to perform new tasks using only a few demonstrations in the prompt. Two different mechanisms have been proposed to explain ICL: induction heads that find and copy relevant tokens, and function vector (FV) heads whose activations compute a latent encoding of the ICL task. To better understand which of the two distinct mechanisms drives ICL, we study and compare induction heads and FV heads in 12 language models. Through detailed ablations, we discover that few-shot ICL performance depends primarily on FV heads, especially in larger models. In addition, we uncover that FV and induction heads are connected: many FV heads start as induction heads during training before transitioning to the FV mechanism. This leads us to speculate that induction facilitates learning the more complex FV mechanism that ultimately drives ICL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes