LGOct 24, 2023

Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions

arXiv:2310.16076v1140 citationsh-index: 22
Originality Incremental advance
AI Analysis

This work addresses the computational capabilities of linear Transformers for researchers in neural network theory, but it is incremental as it builds on existing results for standard Transformers.

The paper investigates the computational power of linear Transformers and their extensions, showing that these models can overcome limitations like generalization on the parity problem through experiments in formal language recognition.

Recent studies of the computational power of recurrent neural networks (RNNs) reveal a hierarchy of RNN architectures, given real-time and finite-precision assumptions. Here we study auto-regressive Transformers with linearised attention, a.k.a. linear Transformers (LTs) or Fast Weight Programmers (FWPs). LTs are special in the sense that they are equivalent to RNN-like sequence processors with a fixed-size state, while they can also be expressed as the now-popular self-attention networks. We show that many well-known results for the standard Transformer directly transfer to LTs/FWPs. Our formal language recognition experiments demonstrate how recently proposed FWP extensions such as recurrent FWPs and self-referential weight matrices successfully overcome certain limitations of the LT, e.g., allowing for generalisation on the parity problem. Our code is public.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes