CLLGOct 21, 2020

Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle

arXiv:2011.07960v2728 citations
AI Analysis

This addresses generalization and over-parametrization issues in language models for NLP applications, but it is incremental as it builds on existing syntax-aware approaches.

The authors tackled the problem of language models failing to capture syntax by proposing Syntactic Ordered Memory (SOM), which explicitly models syntax with incremental parsing and a dynamic oracle, achieving strong results in language modeling, incremental parsing, and syntactic generalization tests while using fewer parameters.

Syntax is fundamental to our thinking about language. Failing to capture the structure of input language could lead to generalization problems and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with an incremental parser and maintains the conditional probability setting of a standard language model (left-to-right). To train the incremental parser and avoid exposure bias, we also propose a novel dynamic oracle, so that SOM is more robust to wrong parsing decisions. Experiments show that SOM can achieve strong results in language modeling, incremental parsing and syntactic generalization tests, while using fewer parameters than other models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes