CLJan 22, 2025

Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities

arXiv:2501.12980v12 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of benchmarking LLM discourse understanding for researchers, but it is incremental as it applies existing psycholinguistic methods to LLMs without major breakthroughs.

The paper compared human and LLM responses to implicit causality verbs across four experiments, finding that only the largest monolingual LLM showed human-like coreference biases, no LLMs displayed human-like coherence biases, and all LLMs had preferences in referring expressions but not the bias effects seen in humans.

In this paper, we compare data generated with mono- and multilingual LLMs spanning a range of model sizes with data provided by human participants in an experimental setting investigating well-established discourse biases. Beyond the comparison as such, we aim to develop a benchmark to assess the capabilities of LLMs with discourse biases as a robust proxy for more general discourse understanding capabilities. More specifically, we investigated Implicit Causality verbs, for which psycholinguistic research has found participants to display biases with regard to three phenomena:\ the establishment of (i) coreference relations (Experiment 1), (ii) coherence relations (Experiment 2), and (iii) the use of particular referring expressions (Experiments 3 and 4). With regard to coreference biases we found only the largest monolingual LLM (German Bloom 6.4B) to display more human-like biases. For coherence relation, no LLM displayed the explanation bias usually found for humans. For referring expressions, all LLMs displayed a preference for referring to subject arguments with simpler forms than to objects. However, no bias effect on referring expression was found, as opposed to recent studies investigating human biases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes