LGMLMar 14, 2025

Test-Time Training Provably Improves Transformers as In-context Learners

arXiv:2503.11842v111 citationsh-index: 40ICML
Originality Incremental advance
AI Analysis

This work addresses the challenge of distribution shift and sample efficiency in in-context learning for transformers, particularly in tabular classification, though it is incremental as it builds on existing TTT methods.

The paper tackles the problem of improving transformer models as in-context learners by using test-time training (TTT), and the result shows that TTT significantly reduces the required sample size for tabular classification by 3 to 5 times, enhancing inference efficiency with minimal training cost.

Test-time training (TTT) methods explicitly update the weights of a model to adapt to the specific test instance, and they have found success in a variety of settings, including most recently language modeling and reasoning. To demystify this success, we investigate a gradient-based TTT algorithm for in-context learning, where we train a transformer model on the in-context demonstrations provided in the test prompt. Specifically, we provide a comprehensive theoretical characterization of linear transformers when the update rule is a single gradient step. Our theory (i) delineates the role of alignment between pretraining distribution and target task, (ii) demystifies how TTT can alleviate distribution shift, and (iii) quantifies the sample complexity of TTT including how it can significantly reduce the eventual sample size required for in-context learning. As our empirical contribution, we study the benefits of TTT for TabPFN, a tabular foundation model. In line with our theory, we demonstrate that TTT significantly reduces the required sample size for tabular classification (3 to 5 times fewer) unlocking substantial inference efficiency with a negligible training cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes