CVGRLGApr 13, 2023

Expressive Text-to-Image Generation with Rich Text

arXiv:2304.06720v4102 citationsh-index: 23
Originality Incremental advance
AI Analysis

This addresses the challenge for users in accurately describing complex scenes in text-to-image synthesis, though it is an incremental advancement over existing methods.

The paper tackles the problem of limited customization in text-to-image generation by proposing a rich-text editor that allows users to specify attributes like color and importance per word, enabling local style control and precise rendering, and demonstrates quantitative improvements over baselines.

Plain text has become a prevalent interface for text-to-image synthesis. However, its limited customization options hinder users from accurately describing desired outputs. For example, plain text makes it hard to specify continuous quantities, such as the precise RGB color value or importance of each word. Furthermore, creating detailed text prompts for complex scenes is tedious for humans to write and challenging for text encoders to interpret. To address these challenges, we propose using a rich-text editor supporting formats such as font style, size, color, and footnote. We extract each word's attributes from rich text to enable local style control, explicit token reweighting, precise color rendering, and detailed region synthesis. We achieve these capabilities through a region-based diffusion process. We first obtain each word's region based on attention maps of a diffusion process using plain text. For each region, we enforce its text attributes by creating region-specific detailed prompts and applying region-specific guidance, and maintain its fidelity against plain-text generation through region-based injections. We present various examples of image generation from rich text and demonstrate that our method outperforms strong baselines with quantitative evaluations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes