LGJan 23

On the Expressive Power of Floating-Point Transformers

arXiv:2601.16450v11 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses the gap between theoretical models and practical implementations in machine learning, revealing limitations in floating-point transformers for long sequences.

The paper investigates the representability of transformers using floating-point parameters and operations, showing they can represent non-permutation-equivariant functions without positional encoding and all permutation-equivariant functions for bounded sequence lengths, but not for large ones, with minimal equivariance structures identified.

The study on the expressive power of transformers shows that transformers are permutation equivariant, and they can approximate all permutation-equivariant continuous functions on a compact domain. However, these results are derived under real parameters and exact operations, while real implementations on computers can only use a finite set of numbers and inexact machine operations with round-off errors. In this work, we investigate the representability of floating-point transformers that use floating-point parameters and floating-point operations. Unlike existing results under exact operations, we first show that floating-point transformers can represent a class of non-permutation-equivariant functions even without positional encoding. Furthermore, we prove that floating-point transformers can represent all permutation-equivariant functions when the sequence length is bounded, but they cannot when the sequence length is large. We also found the minimal equivariance structure in floating-point transformers, and show that all non-trivial additive positional encoding can harm the representability of floating-point transformers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes