CLAug 27, 2018

Targeted Syntactic Evaluation of Language Models

arXiv:1808.09031v11239 citations
Originality Incremental advance
AI Analysis

This addresses the need for better syntactic evaluation in NLP, but it is incremental as it builds on existing datasets and methods.

The authors tackled the problem of evaluating language models' grammaticality by creating a dataset of minimally different grammatical and ungrammatical sentence pairs for phenomena like subject-verb agreement, finding that an LSTM model performed poorly with only 65% accuracy and multi-task training improved it to 75%, but a large gap remained compared to human accuracy of 95%.

We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM's accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes