Tighter Bounds on the Expressivity of Transformer Encoders
This work addresses a foundational theoretical problem in machine learning for researchers studying the capabilities and limitations of transformer architectures, representing an incremental advance by bridging existing bounds.
The paper tackles the problem of characterizing the expressivity of transformer encoders by connecting and strengthening prior results, identifying a variant of first-order logic with counting quantifiers that serves as both an upper bound for fixed-precision transformers and a lower bound for general transformers, bringing researchers closer to an exact characterization of the languages they recognize.
Characterizing neural networks in terms of better-understood formal systems has the potential to yield new insights into the power and limitations of these networks. Doing so for transformers remains an active area of research. Bhattamishra and others have shown that transformer encoders are at least as expressive as a certain kind of counter machine, while Merrill and Sabharwal have shown that fixed-precision transformer encoders recognize only languages in uniform $TC^0$. We connect and strengthen these results by identifying a variant of first-order logic with counting quantifiers that is simultaneously an upper bound for fixed-precision transformer encoders and a lower bound for transformer encoders. This brings us much closer than before to an exact characterization of the languages that transformer encoders recognize.