CVOct 12, 2023

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

MicrosoftUW
arXiv:2310.08541v20.3933 citationsh-index: 52
AI Analysis55

This addresses the challenge of efficiently converting abstract ideas into effective image prompts for users of text-to-image models, though it is incremental as it builds on existing large multimodal models.

The paper tackles the problem of automatically generating high-quality images from high-level ideas by introducing Idea2Img, a system that uses GPT-4V for iterative self-refinement to explore text-to-image models and produce better prompts, resulting in images with improved semantic and visual qualities as validated by user preference studies.

We introduce ``Idea to Image,'' a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation. Humans can quickly identify the characteristics of different text-to-image (T2I) models via iterative explorations. This enables them to efficiently convert their high-level generation ideas into effective T2I prompts that can produce good images. We investigate if systems based on large multimodal models (LMMs) can develop analogous multimodal self-refinement abilities that enable exploring unknown models or environments via self-refining tries. Idea2Img cyclically generates revised T2I prompts to synthesize draft images, and provides directional feedback for prompt revision, both conditioned on its memory of the probed T2I model's characteristics. The iterative self-refinement brings Idea2Img various advantages over vanilla T2I models. Notably, Idea2Img can process input ideas with interleaved image-text sequences, follow ideas with design instructions, and generate images of better semantic and visual qualities. The user preference study validates the efficacy of multimodal iterative self-refinement on automatic image design and generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes