LG AISep 24, 2025

Linear Transformers Implicitly Discover Unified Numerical Algorithms

Patrick Lutz, Aditya Gangrade, Hadi Daneshmand, Venkatesh Saligrama

arXiv:2509.19702v19.42 citationsh-index: 8

Originality Highly original

AI Analysis

This work demonstrates that transformers can autonomously discover efficient numerical algorithms for matrix problems, which is significant for machine learning practitioners seeking adaptive solvers, though it is incremental in showing implicit learning capabilities.

The researchers trained a linear attention transformer on masked-block matrix completion tasks to predict missing blocks, and it implicitly discovered a unified, parameter-free update rule that achieves second-order convergence, reduces distributed iteration complexity, and maintains accuracy with rank-limited attention.

We train a linear attention transformer on millions of masked-block matrix completion tasks: each prompt is masked low-rank matrix whose missing block may be (i) a scalar prediction target or (ii) an unseen kernel slice of Nyström extrapolation. The model sees only input-output pairs and a mean-squared loss; it is given no normal equations, no handcrafted iterations, and no hint that the tasks are related. Surprisingly, after training, algebraic unrolling reveals the same parameter-free update rule across three distinct computational regimes (full visibility, rank-limited updates, and distributed computation). We prove that this rule achieves second-order convergence on full-batch problems, cuts distributed iteration complexity, and remains accurate with rank-limited attention. Thus, a transformer trained solely to patch missing blocks implicitly discovers a unified, resource-adaptive iterative solver spanning prediction, estimation, and Nyström extrapolation, highlighting a powerful capability of in-context learning.

View on arXiv PDF

Similar