CLAIApr 1, 2025

Is the Top Still Spinning? Evaluating Subjectivity in Narrative Understanding

arXiv:2504.01132v21 citationsh-index: 10EMNLP
Originality Incremental advance
AI Analysis

This addresses the issue of ambiguous claims in narrative summarization for researchers and practitioners, offering a more reliable evaluation method, though it is incremental as it builds on existing faithfulness tasks.

The paper tackled the problem of evaluating claim faithfulness in narrative understanding by reframing it to manage subjectivity, introducing the Ambiguity Rewrite Metric (ARM) which uses LLM-generated edits to provide nuanced evaluation, resulting in a 21% absolute improvement in annotator agreement.

Determining faithfulness of a claim to a source document is an important problem across many domains. This task is generally treated as a binary judgment of whether the claim is supported or unsupported in relation to the source. In many cases, though, whether a claim is supported can be ambiguous. For instance, it may depend on making inferences from given evidence, and different people can reasonably interpret the claim as either supported or unsupported based on their agreement with those inferences. Forcing binary labels upon such claims lowers the reliability of evaluation. In this work, we reframe the task to manage the subjectivity involved with factuality judgments of ambiguous claims. We introduce LLM-generated edits of summaries as a method of providing a nuanced evaluation of claims: how much does a summary need to be edited to be unambiguous? Whether a claim gets rewritten and how much it changes can be used as an automatic evaluation metric, the Ambiguity Rewrite Metric (ARM), with a much richer feedback signal than a binary judgment of faithfulness. We focus on the area of narrative summarization as it is particularly rife with ambiguity and subjective interpretation. We show that ARM produces a 21% absolute improvement in annotator agreement on claim faithfulness, indicating that subjectivity is reduced.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes