A Discerning Several Thousand Judgments: GPT-3 Rates the Article + Adjective + Numeral + Noun Construction
This work addresses the challenge of assessing LLMs' syntactic knowledge for rare constructions, which is incremental as it applies existing methods to a specific linguistic domain.
The study evaluated GPT-3's ability to judge the acceptability of rare English syntactic constructions, specifically the Article + Adjective + Numeral + Noun pattern, finding that its judgments were broadly similar to human judgments but showed some divergences.
Knowledge of syntax includes knowledge of rare, idiosyncratic constructions. LLMs must overcome frequency biases in order to master such constructions. In this study, I prompt GPT-3 to give acceptability judgments on the English-language Article + Adjective + Numeral + Noun construction (e.g., "a lovely five days"). I validate the prompt using the CoLA corpus of acceptability judgments and then zero in on the AANN construction. I compare GPT- 3's judgments to crowdsourced human judgments on a subset of sentences. GPT-3's judgments are broadly similar to human judgments and generally align with proposed constraints in the literature but, in some cases, GPT-3's judgments and human judgments diverge from the literature and from each other.