LGMar 28, 2025

Concise One-Layer Transformers Can Do Function Evaluation (Sometimes)

arXiv:2503.22076v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses a foundational problem in understanding transformer computation for researchers in theoretical machine learning, though it is incremental in scope.

The paper investigates the expressive capacity of transformers for evaluating arbitrary functions from [n] to [n], proving that concise 1-layer transformers can perform this task under certain input representations but not others, and showing experimental alignment with practical learning.

While transformers have proven enormously successful in a range of tasks, their fundamental properties as models of computation are not well understood. This paper contributes to the study of the expressive capacity of transformers, focusing on their ability to perform the fundamental computational task of evaluating an arbitrary function from $[n]$ to $[n]$ at a given argument. We prove that concise 1-layer transformers (i.e., with a polylog bound on the product of the number of heads, the embedding dimension, and precision) are capable of doing this task under some representations of the input, but not when the function's inputs and values are only encoded in different input positions. Concise 2-layer transformers can perform the task even with the more difficult input representation. Experimentally, we find a rough alignment between what we have proven can be computed by concise transformers and what can be practically learned.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes