CLAug 22, 2025

ComicScene154: A Scene Dataset for Comic Analysis

Sandro Paval, Ivan P. Yamshchikov, Pascal Meißner

arXiv:2508.16190v11 citationsh-index: 7EMNLP

Originality Synthesis-oriented

AI Analysis

This dataset addresses a problem for researchers in multimodal narrative understanding and the NLP community by offering a new resource for comic analysis, though it is incremental as it builds upon existing multimodal data concepts.

The authors tackled the lack of computational resources for analyzing comics by introducing ComicScene154, a manually annotated dataset of scene-level narrative arcs from public-domain comic books, and provided a baseline scene segmentation pipeline as an initial benchmark.

Comics offer a compelling yet under-explored domain for computational narrative analysis, combining text and imagery in ways distinct from purely textual or audiovisual media. We introduce ComicScene154, a manually annotated dataset of scene-level narrative arcs derived from public-domain comic books spanning diverse genres. By conceptualizing comics as an abstraction for narrative-driven, multimodal data, we highlight their potential to inform broader research on multi-modal storytelling. To demonstrate the utility of ComicScene154, we present a baseline scene segmentation pipeline, providing an initial benchmark that future studies can build upon. Our results indicate that ComicScene154 constitutes a valuable resource for advancing computational methods in multimodal narrative understanding and expanding the scope of comic analysis within the Natural Language Processing community.

View on arXiv PDF

Similar