MLLGNENAPRFeb 5, 2025

Is In-Context Universality Enough? MLPs are Also Universal In-Context

ETH Zurich
arXiv:2502.03327v14 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This challenges the assumption that in-context universality is key to transformers' advantage, suggesting other factors like inductive bias or training stability matter more, which is incremental as it refines understanding without introducing a new method.

The paper tackles the question of whether in-context universality explains transformers' success by proving that MLPs with trainable activation functions are also universal in-context, showing this property is not unique to transformers.

The success of transformers is often linked to their ability to perform in-context learning. Recent work shows that transformers are universal in context, capable of approximating any real-valued continuous function of a context (a probability measure over $\mathcal{X}\subseteq \mathbb{R}^d$) and a query $x\in \mathcal{X}$. This raises the question: Does in-context universality explain their advantage over classical models? We answer this in the negative by proving that MLPs with trainable activation functions are also universal in-context. This suggests the transformer's success is likely due to other factors like inductive bias or training stability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes