CLAIApr 5, 2024

Scope Ambiguities in Large Language Models

arXiv:2404.04332v130 citationsh-index: 43TACL
Originality Synthesis-oriented
AI Analysis

This addresses a gap in understanding how modern language models process semantic ambiguities, which is important for NLP researchers and developers, though it is incremental as it builds on existing model evaluation without introducing new methods.

The paper tackled the problem of scope ambiguities in large language models by investigating how models like GPT-2, GPT-3/3.5, Llama 2, and GPT-4 handle these sentences compared to human judgments, finding that some models achieve over 90% accuracy in identifying human-preferred readings.

Sentences containing multiple semantic operators with overlapping scope often create ambiguities in interpretation, known as scope ambiguities. These ambiguities offer rich insights into the interaction between semantic structure and world knowledge in language processing. Despite this, there has been little research into how modern large language models treat them. In this paper, we investigate how different versions of certain autoregressive language models -- GPT-2, GPT-3/3.5, Llama 2 and GPT-4 -- treat scope ambiguous sentences, and compare this with human judgments. We introduce novel datasets that contain a joint total of almost 1,000 unique scope-ambiguous sentences, containing interactions between a range of semantic operators, and annotated for human judgments. Using these datasets, we find evidence that several models (i) are sensitive to the meaning ambiguity in these sentences, in a way that patterns well with human judgments, and (ii) can successfully identify human-preferred readings at a high level of accuracy (over 90% in some cases).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes