CL CC LGAug 6, 2023

Average-Hard Attention Transformers are Constant-Depth Uniform Threshold Circuits

arXiv:2308.03212v26.824 citationsh-index: 4

Originality Synthesis-oriented

AI Analysis

This is an incremental theoretical contribution that clarifies the computational complexity of transformer models for researchers in formal language theory and neural network analysis.

The paper extends a prior result by showing that average-hard attention transformers can be simulated by constant-depth uniform threshold circuits, matching the uniformity previously established for log-precision transformers.

Transformers have emerged as a widely used neural network model for various natural language processing tasks. Previous research explored their relationship with constant-depth threshold circuits, making two assumptions: average-hard attention and logarithmic precision for internal computations relative to input length. Merrill et al. (2022) prove that average-hard attention transformers recognize languages that fall within the complexity class TC0, denoting the set of languages that can be recognized by constant-depth polynomial-size threshold circuits. Likewise, Merrill and Sabharwal (2023) show that log-precision transformers recognize languages within the class of uniform TC0. This shows that both transformer models can be simulated by constant-depth threshold circuits, with the latter being more robust due to generating a uniform circuit family. Our paper shows that the first result can be extended to yield uniform circuits as well.

View on arXiv PDF

Similar