CLCCLGAug 6, 2023

Average-Hard Attention Transformers are Constant-Depth Uniform Threshold Circuits

arXiv:2308.03212v224 citationsh-index: 4
AI Analysis

This is an incremental theoretical contribution that clarifies the computational complexity of transformer models for researchers in formal language theory and neural network analysis.

The paper extends a prior result by showing that average-hard attention transformers can be simulated by constant-depth uniform threshold circuits, matching the uniformity previously established for log-precision transformers.

Transformers have emerged as a widely used neural network model for various natural language processing tasks. Previous research explored their relationship with constant-depth threshold circuits, making two assumptions: average-hard attention and logarithmic precision for internal computations relative to input length. Merrill et al. (2022) prove that average-hard attention transformers recognize languages that fall within the complexity class TC0, denoting the set of languages that can be recognized by constant-depth polynomial-size threshold circuits. Likewise, Merrill and Sabharwal (2023) show that log-precision transformers recognize languages within the class of uniform TC0. This shows that both transformer models can be simulated by constant-depth threshold circuits, with the latter being more robust due to generating a uniform circuit family. Our paper shows that the first result can be extended to yield uniform circuits as well.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes