LGAIDec 12, 2025

A Simple Generalisation of the Implicit Dynamics of In-Context Learning

arXiv:2512.11255v12 citationsh-index: 6
Originality Synthesis-oriented
AI Analysis

This work incrementally advances theoretical understanding of in-context learning in transformers, potentially aiding practical applications in large-scale models.

The paper generalizes a prior theory of in-context learning by extending implicit weight updates in transformers to all sequence positions, any block, and more realistic architectures like layer normalization, and empirically validates this on linear regression tasks.

In-context learning (ICL) refers to the ability of a model to learn new tasks from examples in its input without any parameter updates. In contrast to previous theories of ICL relying on toy models and data settings, recently it has been shown that an abstraction of a transformer block can be seen as implicitly updating the weights of its feedforward network according to the context (Dherin et al., 2025). Here, we provide a simple generalisation of this result for (i) all sequence positions beyond the last, (ii) any transformer block beyond the first, and (iii) more realistic residual blocks including layer normalisation. We empirically verify our theory on simple in-context linear regression tasks and investigate the relationship between the implicit updates related to different tokens within and between blocks. These results help to bring the theory of Dherin et al. (2025) even closer to practice, with potential for validation on large-scale models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes