ML LG NE NA PRFeb 5, 2025

Is In-Context Universality Enough? MLPs are Also Universal In-Context

ETH Zurich

arXiv:2502.03327v114.04 citationsh-index: 7

Originality Synthesis-oriented

AI Analysis

This challenges the assumption that in-context universality is key to transformers' advantage, suggesting other factors like inductive bias or training stability matter more, which is incremental as it refines understanding without introducing a new method.

The paper tackles the question of whether in-context universality explains transformers' success by proving that MLPs with trainable activation functions are also universal in-context, showing this property is not unique to transformers.

The success of transformers is often linked to their ability to perform in-context learning. Recent work shows that transformers are universal in context, capable of approximating any real-valued continuous function of a context (a probability measure over $\mathcal{X}\subseteq \mathbb{R}^d$) and a query $x\in \mathcal{X}$. This raises the question: Does in-context universality explain their advantage over classical models? We answer this in the negative by proving that MLPs with trainable activation functions are also universal in-context. This suggests the transformer's success is likely due to other factors like inductive bias or training stability.

View on arXiv PDF

Similar