Comparison of different Unique hard attention transformer models by the formal languages they can recognize
This is an incremental theoretical analysis for researchers in formal language theory and transformer models.
The paper surveys the capabilities of unique hard attention transformer encoders (UHATs) to recognize formal languages, comparing variants like masked vs. non-masked and providing bounds in terms of first-order logic and circuit complexity.
This note is a survey of various results on the capabilities of unique hard attention transformers encoders (UHATs) to recognize formal languages. We distinguish between masked vs. non-masked, finite vs. infinite image and general vs. bilinear attention score functions. We recall some relations between these models, as well as a lower bound in terms of first-order logic and an upper bound in terms of circuit complexity.