LGFeb 29, 2024

Dual Operating Modes of In-Context Learning

arXiv:2402.18819v227.853 citationsh-index: 6Has CodeICML

Originality Incremental advance

AI Analysis

This provides a theoretical foundation for understanding in-context learning in AI, addressing a gap in existing models, but it is incremental as it builds on prior work with specific extensions.

The paper tackles the dual operating modes of in-context learning (task learning and task retrieval) by introducing a probabilistic model that explains both simultaneously, deriving a closed-form expression for task posterior distribution and explaining the 'early ascent' phenomenon where risk initially increases with more in-context examples.

In-context learning (ICL) exhibits dual operating modes: task learning, i.e., acquiring a new skill from in-context samples, and task retrieval, i.e., locating and activating a relevant pretrained skill. Recent theoretical work investigates various mathematical models to analyze ICL, but existing models explain only one operating mode at a time. We introduce a probabilistic model, with which one can explain the dual operating modes of ICL simultaneously. Focusing on in-context learning of linear functions, we extend existing models for pretraining data by introducing multiple task groups and task-dependent input distributions. We then analyze the behavior of the optimally pretrained model under the squared loss, i.e., the MMSE estimator of the label given in-context examples. Regarding pretraining task distribution as prior and in-context examples as the observation, we derive the closed-form expression of the task posterior distribution. With the closed-form expression, we obtain a quantitative understanding of the two operating modes of ICL. Furthermore, we shed light on an unexplained phenomenon observed in practice: under certain settings, the ICL risk initially increases and then decreases with more in-context examples. Our model offers a plausible explanation for this "early ascent" phenomenon: a limited number of in-context samples may lead to the retrieval of an incorrect skill, thereby increasing the risk, which will eventually diminish as task learning takes effect with more in-context samples. We also theoretically analyze ICL with biased labels, e.g., zero-shot ICL, where in-context examples are assigned random labels. Lastly, we validate our findings and predictions via experiments involving Transformers and large language models.

View on arXiv PDF Code

Similar