Transformers in Uniform TC$^0$
This work provides a theoretical foundation for understanding the expressiveness of transformers in computational complexity, which is incremental but refines prior bounds for researchers in theoretical computer science and machine learning.
The paper tackles the problem of characterizing the computational complexity of transformers with average-hard and softmax attention, showing that they can be recognized within the DLOGTIME-uniform TC$^0$ circuit complexity class without approximation or with specific precision bounds, improving previous results that required limited-precision arithmetic.
Previous work has shown that the languages recognized by average-hard attention transformers (AHATs) and softmax-attention transformers (SMATs) are within the circuit complexity class TC$^0$. However, these results assume limited-precision arithmetic: using floating-point numbers with O(log n) bits (where n is the length of the input string), Strobl showed that AHATs can be approximated in L-uniform TC$^0$, and Merrill and Sabharwal showed that SMATs can be approximated in DLOGTIME-uniform TC$^0$. Here, we improve these results, showing that AHATs with no approximation, SMATs with O(poly(n)) bits of floating-point precision, and SMATs with at most $2^{-O(poly(n))}$ absolute error are all in DLOGTIME-uniform TC$^0$.