Finding Sense in Nonsense with Generated Contexts: Perspectives from Humans and Language Models
This work addresses a core challenge in semantic interpretation for computational linguistics, though it is incremental as it builds on existing datasets and methods.
The study assessed how nonsensical versus anomalous sentences in existing datasets are and evaluated large language models' ability to distinguish between them, finding that human raters judged most sentences as anomalous rather than nonsensical and that LLMs were skilled at generating plausible contexts for anomalous cases.
Nonsensical and anomalous sentences have been instrumental in the development of computational models of semantic interpretation. A core challenge is to distinguish between what is merely anomalous (but can be interpreted given a supporting context) and what is truly nonsensical. However, it is unclear (a) how nonsensical, rather than merely anomalous, existing datasets are; and (b) how well LLMs can make this distinction. In this paper, we answer both questions by collecting sensicality judgments from human raters and LLMs on sentences from five semantically deviant datasets: both context-free and when providing a context. We find that raters consider most sentences at most anomalous, and only a few as properly nonsensical. We also show that LLMs are substantially skilled in generating plausible contexts for anomalous cases.