CVSep 4, 2025

Plot'n Polish: Zero-shot Story Visualization and Disentangled Editing with Text-to-Image Diffusion Models

arXiv:2509.04446v11 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses the need for enhanced control in creative domains, though it appears incremental as it builds on existing diffusion models.

The paper tackles the problem of lacking control and consistency in story visualization with text-to-image diffusion models, introducing Plot'n Polish to enable zero-shot generation and fine-grained editing while maintaining narrative consistency.

Text-to-image diffusion models have demonstrated significant capabilities to generate diverse and detailed visuals in various domains, and story visualization is emerging as a particularly promising application. However, as their use in real-world creative domains increases, the need for providing enhanced control, refinement, and the ability to modify images post-generation in a consistent manner becomes an important challenge. Existing methods often lack the flexibility to apply fine or coarse edits while maintaining visual and narrative consistency across multiple frames, preventing creators from seamlessly crafting and refining their visual stories. To address these challenges, we introduce Plot'n Polish, a zero-shot framework that enables consistent story generation and provides fine-grained control over story visualizations at various levels of detail.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes