CLAug 27, 2018

Targeted Syntactic Evaluation of Language Models

arXiv:1808.09031v134.11239 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the need for better syntactic evaluation in NLP, but it is incremental as it builds on existing datasets and methods.

The authors tackled the problem of evaluating language models' grammaticality by creating a dataset of minimally different grammatical and ungrammatical sentence pairs for phenomena like subject-verb agreement, finding that an LSTM model performed poorly with only 65% accuracy and multi-task training improved it to 75%, but a large gap remained compared to human accuracy of 95%.

We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM's accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model.

View on arXiv PDF Code

Similar