Re-examining learning linear functions in context
This work addresses the limited understanding of in-context learning mechanisms in LLMs, revealing fundamental limitations in their ability to infer abstract task structures, which is incremental but provides new insights.
The paper investigates in-context learning of linear functions using synthetic data and GPT-2-like transformers, finding that models fail to generalize beyond their training distribution and do not use algorithmic approaches like linear regression.
In-context learning (ICL) has emerged as a powerful paradigm for easily adapting Large Language Models (LLMs) to various tasks. However, our understanding of how ICL works remains limited. We explore a simple model of ICL in a controlled setup with synthetic training data to investigate ICL of univariate linear functions. We experiment with a range of GPT-2-like transformer models trained from scratch. Our findings challenge the prevailing narrative that transformers adopt algorithmic approaches like linear regression to learn a linear function in-context. These models fail to generalize beyond their training distribution, highlighting fundamental limitations in their capacity to infer abstract task structures. Our experiments lead us to propose a mathematically precise hypothesis of what the model might be learning.