AIJan 1

Context Collapse: In-Context Learning and Model Collapse

arXiv:2601.00923v1

Originality Incremental advance

AI Analysis

It addresses stability and learning dynamics in LLMs, with incremental theoretical extensions to existing work.

The thesis investigates in-context learning and model collapse in large language models, showing that minimizing in-context loss leads to a phase transition with a skew-symmetric component above a critical context length, and proves almost sure convergence for model collapse unless data grows fast or is retained.

This thesis investigates two key phenomena in large language models (LLMs): in-context learning (ICL) and model collapse. We study ICL in a linear transformer with tied weights trained on linear regression tasks, and show that minimising the in-context loss leads to a phase transition in the learned parameters. Above a critical context length, the solution develops a skew-symmetric component. We prove this by reducing the forward pass of the linear transformer under weight tying to preconditioned gradient descent, and then analysing the optimal preconditioner. This preconditioner includes a skew-symmetric component, which induces a rotation of the gradient direction. For model collapse, we use martingale and random walk theory to analyse simplified settings - linear regression and Gaussian fitting - under both replacing and cumulative data regimes. We strengthen existing results by proving almost sure convergence, showing that collapse occurs unless the data grows sufficiently fast or is retained over time. Finally, we introduce the notion of context collapse: a degradation of context during long generations, especially in chain-of-thought reasoning. This concept links the dynamics of ICL with long-term stability challenges in generative models.

View on arXiv PDF

Similar