CVGRLGNov 16, 2023

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

arXiv:2311.10093v470 citationsh-index: 23
Originality Incremental advance
AI Analysis

This addresses a crucial issue for users in applications like story visualization and game development, offering an incremental improvement over existing methods.

The paper tackles the problem of generating consistent characters in text-to-image diffusion models, proposing an automated iterative method that improves balance between prompt alignment and identity consistency, as shown by quantitative analysis and a user study.

Recent advances in text-to-image generation models have unlocked vast potential for visual creativity. However, the users that use these models struggle with the generation of consistent characters, a crucial aspect for numerous real-world applications such as story visualization, game development, asset design, advertising, and more. Current methods typically rely on multiple pre-existing images of the target character or involve labor-intensive manual processes. In this work, we propose a fully automated solution for consistent character generation, with the sole input being a text prompt. We introduce an iterative procedure that, at each stage, identifies a coherent set of images sharing a similar identity and extracts a more consistent identity from this set. Our quantitative analysis demonstrates that our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods, and these findings are reinforced by a user study. To conclude, we showcase several practical applications of our approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes