Learning Task Representations from In-Context Learning
This work addresses a challenge in interpreting and generalizing in-context learning for researchers and practitioners, though it appears incremental as it builds on existing transformer-based methods.
The paper tackled the problem of understanding how tasks are internally encoded and generalized in in-context learning for large language models, by introducing an automated method to compute task vectors from attention heads, which successfully extracted task-specific information and performed well in text and regression tasks.
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning (ICL), where models adapt to new tasks through example-based prompts without requiring parameter updates. However, understanding how tasks are internally encoded and generalized remains a challenge. To address some of the empirical and technical gaps in the literature, we introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads within the transformer architecture. This approach computes a single task vector as a weighted sum of attention heads, with the weights optimized causally via gradient descent. Our findings show that existing methods fail to generalize effectively to modalities beyond text. In response, we also design a benchmark to evaluate whether a task vector can preserve task fidelity in functional regression tasks. The proposed method successfully extracts task-specific information from in-context demonstrations and excels in both text and regression tasks, demonstrating its generalizability across modalities.