CVSep 4, 2025

TaleDiffusion: Multi-Character Story Generation with Dialogue Rendering

arXiv:2509.04123v12 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the challenge of artifact generation and disjointed storytelling in text-to-story visualization for applications like animation or interactive media, though it is incremental as it builds on existing diffusion and attention techniques.

The paper tackles the problem of generating consistent multi-character story visualizations with accurate dialogue rendering, introducing TaleDiffusion which outperforms existing methods in consistency, noise reduction, and dialogue rendering.

Text-to-story visualization is challenging due to the need for consistent interaction among multiple characters across frames. Existing methods struggle with character consistency, leading to artifact generation and inaccurate dialogue rendering, which results in disjointed storytelling. In response, we introduce TaleDiffusion, a novel framework for generating multi-character stories with an iterative process, maintaining character consistency, and accurate dialogue assignment via postprocessing. Given a story, we use a pre-trained LLM to generate per-frame descriptions, character details, and dialogues via in-context learning, followed by a bounded attention-based per-box mask technique to control character interactions and minimize artifacts. We then apply an identity-consistent self-attention mechanism to ensure character consistency across frames and region-aware cross-attention for precise object placement. Dialogues are also rendered as bubbles and assigned to characters via CLIPSeg. Experimental results demonstrate that TaleDiffusion outperforms existing methods in consistency, noise reduction, and dialogue rendering.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes