CVMay 30, 2025

ViStoryBench: Comprehensive Benchmark Suite for Story Visualization

arXiv:2505.24862v317 citationsh-index: 11Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better evaluation tools in visual storytelling for researchers and developers, though it is incremental as it builds on existing benchmarks.

The authors tackled the problem of evaluating story visualization models by introducing ViStoryBench, a comprehensive benchmark suite that assesses models across diverse narratives and character settings, resulting in automated metrics validated through human studies.

Story visualization aims to generate coherent image sequences that faithfully depict a narrative and align with character references. Despite progress in generative models, existing benchmarks are narrow in scope, often limited to short prompts, no character reference, or single-image cases, and fall short of real-world storytelling complexity. This hinders a nuanced understanding of model capabilities and limitations. We present ViStoryBench, a comprehensive benchmark designed to evaluate story visualization models across diverse narrative structures, visual styles, and character settings. The benchmark features richly annotated multi-shot scripts derived from curated stories spanning literature, film, and folklore. Large language models assist in story summarization and script generation, with all outputs verified by humans to ensure coherence and fidelity. Character references are carefully curated to maintain intra-story consistency across varying artistic styles. To enable thorough evaluation, ViStoryBench introduces a set of automated metrics that assess character consistency, style similarity, prompt adherence, aesthetic quality, and generation artifacts such as copy-paste behavior. These metrics are validated through human studies, and used to benchmark a broad range of open-source and commercial models. ViStoryBench offers a high-fidelity, multi-dimensional evaluation suite that facilitates systematic analysis and fosters future progress in visual storytelling.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes