CVMay 13, 2024

The Lost Melody: Empirical Observations on Text-to-Video Generation From A Storytelling Perspective

Andrew Shin, Yusuke Mori, Kunitake Kaneko

arXiv:2405.08720v15.25 citationsh-index: 2

Originality Synthesis-oriented

AI Analysis

This work identifies a gap in text-to-video generation for storytelling applications, offering incremental insights for researchers and developers in multimedia AI.

The paper examines text-to-video generation from a storytelling perspective, highlighting that current models focus on single scenes and neglect narrative aspects, and proposes an evaluation framework to address this limitation.

Text-to-video generation task has witnessed a notable progress, with the generated outcomes reflecting the text prompts with high fidelity and impressive visual qualities. However, current text-to-video generation models are invariably focused on conveying the visual elements of a single scene, and have so far been indifferent to another important potential of the medium, namely a storytelling. In this paper, we examine text-to-video generation from a storytelling perspective, which has been hardly investigated, and make empirical remarks that spotlight the limitations of current text-to-video generation scheme. We also propose an evaluation framework for storytelling aspects of videos, and discuss the potential future directions.

View on arXiv PDF

Similar