CLAIJan 19, 2024

Language models align with human judgments on key grammatical constructions

arXiv:2402.01676v270 citationsPNAS
AI Analysis

This work addresses the problem of assessing LLM linguistic capabilities for researchers and practitioners, showing that LLMs perform well on grammatical tasks, though it is incremental as it reinterprets existing data.

The authors re-evaluated a prior study on large language models (LLMs) and found that, contrary to earlier conclusions, LLMs achieve high accuracy and capture fine-grained variation in human grammaticality judgments, demonstrating strong alignment with human linguistic behaviors.

Do large language models (LLMs) make human-like linguistic generalizations? Dentella et al. (2023) ("DGL") prompt several LLMs ("Is the following sentence grammatically correct in English?") to elicit grammaticality judgments of 80 English sentences, concluding that LLMs demonstrate a "yes-response bias" and a "failure to distinguish grammatical from ungrammatical sentences". We re-evaluate LLM performance using well-established practices and find that DGL's data in fact provide evidence for just how well LLMs capture human behaviors. Models not only achieve high accuracy overall, but also capture fine-grained variation in human linguistic judgments.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes