Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets

Omar Momen, Emilie Sitter, Berenike Herrmann, Sina Zarrieß

arXiv:2601.02015v31.11 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of measuring linguistic creativity in metaphors for NLP researchers, but it is incremental as it builds on existing surprisal methods.

The study investigated whether surprisal from language models correlates with metaphor novelty annotations, finding moderate correlations but divergent scaling effects: correlation decreased with model size on corpus-based data and increased on synthetic data.

Novel metaphor comprehension involves complex semantic processes and linguistic creativity, making it an interesting task for studying language models (LMs). This study investigates whether surprisal, a probabilistic measure of predictability in LMs, correlates with annotations of metaphor novelty in different datasets. We analyse the surprisal of metaphoric words in corpus-based and synthetic metaphor datasets using 16 causal LM variants. We propose a cloze-style surprisal method that conditions on full-sentence context. Results show that LM surprisal yields significant moderate correlations with scores/labels of metaphor novelty. We further identify divergent scaling patterns: on corpus-based data, correlation strength decreases with model size (inverse scaling effect), whereas on synthetic data it increases (quality-power hypothesis). We conclude that while surprisal can partially account for annotations of metaphor novelty, it remains limited as a metric of linguistic creativity. Code and data are publicly available: https://github.com/OmarMomen14/surprisal-metaphor-novelty

View on arXiv PDF Code

Similar