CLApr 4, 2024

Probing Large Language Models for Scalar Adjective Lexical Semantics and Scalar Diversity Pragmatics

Fangru Lin, Daniel Altshuler, Janet B. Pierrehumbert

arXiv:2404.03301v123.982 citationsh-index: 47Has CodeLREC

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of evaluating LLMs' linguistic and pragmatic knowledge, which is incremental in nature.

The study investigated how large language models (LLMs) like GPT-4 understand scalar adjectives and their pragmatic implications, finding that while models encode rich lexical-semantic information, they struggle with scalar diversity, and larger models do not consistently perform better.

Scalar adjectives pertain to various domain scales and vary in intensity within each scale (e.g. certain is more intense than likely on the likelihood scale). Scalar implicatures arise from the consideration of alternative statements which could have been made. They can be triggered by scalar adjectives and require listeners to reason pragmatically about them. Some scalar adjectives are more likely to trigger scalar implicatures than others. This phenomenon is referred to as scalar diversity. In this study, we probe different families of Large Language Models such as GPT-4 for their knowledge of the lexical semantics of scalar adjectives and one specific aspect of their pragmatics, namely scalar diversity. We find that they encode rich lexical-semantic information about scalar adjectives. However, the rich lexical-semantic knowledge does not entail a good understanding of scalar diversity. We also compare current models of different sizes and complexities and find that larger models are not always better. Finally, we explain our probing results by leveraging linguistic intuitions and model training objectives.

View on arXiv PDF Code

Similar