CLMar 26

Humans vs Vision-Language Models: A Unified Measure of Narrative Coherence

arXiv:2603.2553775.9h-index: 8Has Code
AI Analysis

This work addresses the problem of evaluating narrative coherence in AI-generated stories for researchers and developers in natural language processing and vision-language modeling, though it is incremental as it builds on existing metrics and datasets.

The study compared narrative coherence in visually grounded stories between human-written narratives and those generated by vision-language models (VLMs), finding that VLMs show broadly similar coherence profiles but differ systematically from humans in discourse organization.

We study narrative coherence in visually grounded stories by comparing human-written narratives with those generated by vision-language models (VLMs) on the Visual Writing Prompts corpus. Using a set of metrics that capture different aspects of narrative coherence, including coreference, discourse relation types, topic continuity, character persistence, and multimodal character grounding, we compute a narrative coherence score. We find that VLMs show broadly similar coherence profiles that differ systematically from those of humans. In addition, differences for individual measures are often subtle, but they become clearer when considered jointly. Overall, our results indicate that, despite human-like surface fluency, model narratives exhibit systematic differences from those of humans in how they organise discourse across a visually grounded story. Our code is available at https://github.com/GU-CLASP/coherence-driven-humans.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes