MLLGSTJan 21

Efficient and Minimax-optimal In-context Nonparametric Regression with Transformers

arXiv:2601.15014v13 citationsh-index: 39
Originality Incremental advance
AI Analysis

This provides an efficient solution for in-context regression tasks, reducing computational requirements while maintaining optimal performance, though it is incremental as it builds on existing transformer methods.

The paper tackles in-context learning for nonparametric regression with smooth functions, proving that a pretrained transformer with logarithmic parameters and polynomial pretraining sequences achieves the minimax-optimal convergence rate of O(n^{-2α/(2α+d)}) in mean squared error, using fewer resources than prior work.

We study in-context learning for nonparametric regression with $α$-Hölder smooth regression functions, for some $α>0$. We prove that, with $n$ in-context examples and $d$-dimensional regression covariates, a pretrained transformer with $Θ(\log n)$ parameters and $Ω\bigl(n^{2α/(2α+d)}\log^3 n\bigr)$ pretraining sequences can achieve the minimax-optimal rate of convergence $O\bigl(n^{-2α/(2α+d)}\bigr)$ in mean squared error. Our result requires substantially fewer transformer parameters and pretraining sequences than previous results in the literature. This is achieved by showing that transformers are able to approximate local polynomial estimators efficiently by implementing a kernel-weighted polynomial basis and then running gradient descent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes