Why transformers are obviously good models of language
This work addresses the problem of modeling language for the linguistics and AI communities, but it is incremental as it builds on existing transformer research without introducing new methods or data.
The paper argues that transformers are effective models of language by highlighting their empirical success over alternative models and connecting their architecture to linguistic theories, suggesting they should be more closely evaluated by linguists as potentially the best available theories.
Nobody knows how language works, but many theories abound. Transformers are a class of neural networks that process language automatically with more success than alternatives, both those based on neural computations and those that rely on other (e.g. more symbolic) mechanisms. Here, I highlight direct connections between the transformer architecture and certain theoretical perspectives on language. The empirical success of transformers relative to alternative models provides circumstantial evidence that the linguistic approaches that transformers embody should be, at least, evaluated with greater scrutiny by the linguistics community and, at best, considered to be the currently best available theories.