CLFeb 19, 2025

Meaning Beyond Truth Conditions: Evaluating Discourse Level Understanding via Anaphora Accessibility

arXiv:2502.14119v11 citationsh-index: 2ACL
Originality Incremental advance
AI Analysis

This work addresses the need for better discourse-level evaluation in NLP, though it is incremental as it builds on existing theoretical research to propose a new diagnostic task.

The authors tackled the problem of evaluating discourse-level understanding in natural language processing by proposing anaphora accessibility as a diagnostic task, and found that while LLMs and humans perform similarly on some aspects, they diverge due to LLMs' reliance on lexical items versus human structural sensitivity.

We present a hierarchy of natural language understanding abilities and argue for the importance of moving beyond assessments of understanding at the lexical and sentence levels to the discourse level. We propose the task of anaphora accessibility as a diagnostic for assessing discourse understanding, and to this end, present an evaluation dataset inspired by theoretical research in dynamic semantics. We evaluate human and LLM performance on our dataset and find that LLMs and humans align on some tasks and diverge on others. Such divergence can be explained by LLMs' reliance on specific lexical items during language comprehension, in contrast to human sensitivity to structural abstractions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes