Average Attention Transformers and Arithmetic Circuits

arXiv:2605.0468310.0h-index: 2

Predicted impact top 76% in CC · last 90 daysOriginality Incremental advance

AI Analysis

This work provides a theoretical characterization of transformer expressiveness for sequence-to-sequence functions, relevant to understanding their computational limits.

The paper analyzes the computational power of transformer encoders, showing that average hard attention can simulate arithmetic circuits of constant depth with unbounded addition, multiplication, and sign gates, and that transformers with average attention compute functions within the same circuit class.

We analyse the computational power of transformer encoders as sequence-to-sequence functions on vectors. We show that average hard attention can be used to simulate arithmetic circuits if they are given as an input to an encoder. The circuit families that can be simulated this way have constant depth while using unbounded addition, binary multiplication and sign gates. The transformers we use have arithmetic circuits instead of feed-forward networks. With typical average attention the functions they compute are also computed by the same class of circuit families. Our results hold for transformers over the reals, rationals and any ring in between the two.

View on arXiv PDF

Similar