CLMar 7, 2025

Coreference as an indicator of context scope in multimodal narrative

Nikolai Ilinykh, Shalom Lappin, Asad Sayeed, Sharid Loáiciga

arXiv:2503.05298v22 citationsh-index: 8Has Code

Originality Incremental advance

AI Analysis

This work addresses a gap in evaluating multimodal AI for narrative tasks, highlighting limitations in context tracking that could impact applications like automated storytelling or human-AI interaction.

The study found that large multimodal language models differ significantly from humans in coreference distribution during visual storytelling, with humans maintaining consistency across texts and images while machines struggle with mixed references despite perceived quality improvements.

We demonstrate that large multimodal language models differ substantially from humans in the distribution of coreferential expressions in a visual storytelling task. We introduce a number of metrics to quantify the characteristics of coreferential patterns in both human- and machine-written texts. Humans distribute coreferential expressions in a way that maintains consistency across texts and images, interleaving references to different entities in a highly varied way. Machines are less able to track mixed references, despite achieving perceived improvements in generation quality. Materials, metrics, and code for our study are available at https://github.com/GU-CLASP/coreference-context-scope.

View on arXiv PDF Code

Similar