LGMay 29

Fixed Universal Transformers

arXiv:2605.3142362.5
Predicted impact top 33% in LG · last 90 daysOriginality Highly original
AI Analysis

This work addresses the fundamental question of where a transformer's expressive power resides, suggesting it may be more in the input representation than learned weights, which is significant for researchers in transformer architecture and representation learning.

This paper introduces universal transformers, which are fixed transformers capable of simulating any other transformer within a specified class through a suitable input embedding. The authors provide sparse constructions for achieving universality and demonstrate that randomly initialized transformers are almost surely universal, validating this empirically on tasks like parenthesis balancing and multi-hop reasoning.

We introduce \emph{universal transformers}: fixed transformers that can simulate any transformer in a given class via a suitable input embedding. Analogous to a universal Turing machine, the input embedding encodes a description of the target model while all internal parameters remain fixed. We provide explicit sparse constructions achieving universality when the embedding dimension is sufficiently large, and further show that universality is generic: randomly initialized transformers are universal almost surely, which aligns with recent empirical results of Zhong and Andreas (2024). We empirically validate our theory on the algorithmic tasks of parenthesis balancing and multi-hop reasoning. Our results suggest that much of a transformer's expressive power may reside in its input representation rather than its learned weights.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes