LGAISep 24, 2025

Linear Transformers Implicitly Discover Unified Numerical Algorithms

arXiv:2509.19702v12 citationsh-index: 8
Originality Highly original
AI Analysis

This work demonstrates that transformers can autonomously discover efficient numerical algorithms for matrix problems, which is significant for machine learning practitioners seeking adaptive solvers, though it is incremental in showing implicit learning capabilities.

The researchers trained a linear attention transformer on masked-block matrix completion tasks to predict missing blocks, and it implicitly discovered a unified, parameter-free update rule that achieves second-order convergence, reduces distributed iteration complexity, and maintains accuracy with rank-limited attention.

We train a linear attention transformer on millions of masked-block matrix completion tasks: each prompt is masked low-rank matrix whose missing block may be (i) a scalar prediction target or (ii) an unseen kernel slice of Nyström extrapolation. The model sees only input-output pairs and a mean-squared loss; it is given no normal equations, no handcrafted iterations, and no hint that the tasks are related. Surprisingly, after training, algebraic unrolling reveals the same parameter-free update rule across three distinct computational regimes (full visibility, rank-limited updates, and distributed computation). We prove that this rule achieves second-order convergence on full-batch problems, cuts distributed iteration complexity, and remains accurate with rank-limited attention. Thus, a transformer trained solely to patch missing blocks implicitly discovers a unified, resource-adaptive iterative solver spanning prediction, estimation, and Nyström extrapolation, highlighting a powerful capability of in-context learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes