CLAIJul 24, 2024

Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models

arXiv:2407.17406v129 citationsh-index: 27Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing language model generalization for NLP applications by incorporating syntactic dependency structures, representing an incremental advance over prior constituency-based approaches.

The authors tackled the problem of improving Transformer language model generalization by integrating dependency structures, introducing Dependency Transformer Grammars (DTGs) that modify attention patterns and encoding to simulate dependency transitions. DTGs achieved better generalization with comparable perplexity to baselines and outperformed constituency-based models, showing dependency structures are more effective guides.

Syntactic Transformer language models aim to achieve better generalization through simultaneously modeling syntax trees and sentences. While prior work has been focusing on adding constituency-based structures to Transformers, we introduce Dependency Transformer Grammars (DTGs), a new class of Transformer language model with explicit dependency-based inductive bias. DTGs simulate dependency transition systems with constrained attention patterns by modifying attention masks, incorporate the stack information through relative positional encoding, and augment dependency arc representation with a combination of token embeddings and operation embeddings. When trained on a dataset of sentences annotated with dependency trees, DTGs achieve better generalization while maintaining comparable perplexity with Transformer language model baselines. DTGs also outperform recent constituency-based models, showing that dependency can better guide Transformer language models. Our code is released at https://github.com/zhaoyd1/Dep_Transformer_Grammars.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes