Lower bounds on transformers with infinite precision
This work addresses a theoretical gap in understanding transformer limitations for researchers in machine learning theory, but it is incremental as it builds on prior tasks.
The authors tackled the problem of establishing lower bounds on the capabilities of one-layer softmax transformers with infinite precision, proving the first such lower bounds for function composition and the SUM$_2$ task.
In this note, we use the VC dimension technique to prove the first lower bound against one-layer softmax transformers with infinite precision. We do so for two tasks: function composition, considered by Peng, Narayanan, and Papadimitriou, and the SUM$_2$ task, considered by Sanford, Hsu, and Telgarsky.