LGAIFeb 19

In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks

arXiv:2602.17171v1
Originality Synthesis-oriented
AI Analysis

This study provides insights into attention mechanisms for in-context learning, but it is incremental as it builds on prior work on simple function classes.

The paper empirically compared in-context learning between linear and quadratic attention models on linear regression tasks, finding differences in learning quality, convergence, and generalization, with linear attention showing limitations relative to quadratic attention.

Recent work has demonstrated that transformers and linear attention models can perform in-context learning (ICL) on simple function classes, such as linear regression. In this paper, we empirically study how these two attention mechanisms differ in their ICL behavior on the canonical linear-regression task of Garg et al. We evaluate learning quality (MSE), convergence, and generalization behavior of each architecture. We also analyze how increasing model depth affects ICL performance. Our results illustrate both the similarities and limitations of linear attention relative to quadratic attention in this setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes