CLMar 1, 2022

Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

arXiv:2203.00633v2231 citationsh-index: 77
AI Analysis

This work addresses the challenge of improving language models with syntax for researchers in NLP, though it is incremental in combining existing Transformer and syntactic approaches.

The authors tackled the problem of incorporating syntactic inductive biases into Transformer language models, resulting in Transformer Grammars (TGs) that outperform strong baselines on sentence-level perplexity and syntax-sensitive metrics, but show that the syntactic bottleneck harms document-level modeling.

We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentence-level language modeling perplexity, as well as on multiple syntax-sensitive language modeling evaluation metrics. Additionally, we find that the recursive syntactic composition bottleneck which represents each sentence as a single vector harms perplexity on document-level language modeling, providing evidence that a different kind of memory mechanism -- one that is independent of composed syntactic representations -- plays an important role in current successful models of long text.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes