CVAIMMJul 13, 2024

ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context

arXiv:2407.09774v312 citationsh-index: 8Has Code
AI Analysis

This work addresses visual storytelling for applications like animation or content creation, but it appears incremental as it builds on existing methods with enhancements.

The paper tackles the problem of generating coherent visual story frames from text by addressing memory, speed, and context limitations in autoregressive methods, resulting in significant performance improvements on PororoSV and FlintstonesSV datasets.

Visual storytelling involves generating a sequence of coherent frames from a textual storyline while maintaining consistency in characters and scenes. Existing autoregressive methods, which rely on previous frame-sentence pairs, struggle with high memory usage, slow generation speeds, and limited context integration. To address these issues, we propose ContextualStory, a novel framework designed to generate coherent story frames and extend frames for visual storytelling. ContextualStory utilizes Spatially-Enhanced Temporal Attention to capture spatial and temporal dependencies, handling significant character movements effectively. Additionally, we introduce a Storyline Contextualizer to enrich context in storyline embedding, and a StoryFlow Adapter to measure scene changes between frames for guiding the model. Extensive experiments on PororoSV and FlintstonesSV datasets demonstrate that ContextualStory significantly outperforms existing SOTA methods in both story visualization and continuation. Code is available at https://github.com/sixiaozheng/ContextualStory.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes