Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages
This provides a foundational theoretical insight for researchers in formal language theory and transformer models, though it is incremental in extending known results from LTL to transformers.
The paper tackled the problem of characterizing the expressive power of transformers with hard attention and masking, showing they are exactly equivalent to linear temporal logic and recognize star-free languages. The result establishes an exact equivalence without position embeddings and strict masking.
The expressive power of transformers over inputs of unbounded size can be studied through their ability to recognize classes of formal languages. In this paper, we establish exact characterizations of transformers with hard attention (in which all attention is focused on exactly one position) and attention masking (in which each position only attends to positions on one side). With strict masking (each position cannot attend to itself) and without position embeddings, these transformers are expressively equivalent to linear temporal logic (LTL), which defines exactly the star-free languages. A key technique is the use of Boolean RASP as a convenient intermediate language between transformers and LTL. We then take numerous results known for LTL and apply them to transformers, showing how position embeddings, strict masking, and depth all increase expressive power.