CVAILGMLFeb 8, 2023

Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models

arXiv:2302.03900v133 citationsh-index: 38
Originality Incremental advance
AI Analysis

This addresses the need for coherent image sequences in applications like storytelling, offering a zero-shot approach that avoids expensive training data, though it is incremental in building on existing diffusion models.

The paper tackles the problem of generating coherent sequences of images from plain text stories, presenting a neural pipeline that combines a pre-trained Large Language Model and a text-guided Latent Diffusion Model to achieve zero-shot generation. Experimental results show that the proposed method outperforms state-of-the-art image editing baselines.

Recent advancements in large scale text-to-image models have opened new possibilities for guiding the creation of images through human-devised natural language. However, while prior literature has primarily focused on the generation of individual images, it is essential to consider the capability of these models to ensure coherency within a sequence of images to fulfill the demands of real-world applications such as storytelling. To address this, here we present a novel neural pipeline for generating a coherent storybook from the plain text of a story. Specifically, we leverage a combination of a pre-trained Large Language Model and a text-guided Latent Diffusion Model to generate coherent images. While previous story synthesis frameworks typically require a large-scale text-to-image model trained on expensive image-caption pairs to maintain the coherency, we employ simple textual inversion techniques along with detector-based semantic image editing which allows zero-shot generation of the coherent storybook. Experimental results show that our proposed method outperforms state-of-the-art image editing baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes