CVOct 11, 2023

OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation

Microsoft
arXiv:2310.07749v216 citationsh-index: 52
Originality Incremental advance
AI Analysis

This work addresses the challenge of generating coherent mixed media content for applications such as education and design, though it is incremental as it builds on existing LLM and T2I models.

The authors tackled the problem of open-domain interleaved image-text generation by proposing OpenLEAF, a framework that uses LLMs and T2I models to improve entity and style consistency, resulting in high-quality content for tasks like storytelling and webpage generation as validated by LMM and human evaluation.

This work investigates a challenging task named open-domain interleaved image-text generation, which generates interleaved texts and images following an input query. We propose a new interleaved generation framework based on prompting large-language models (LLMs) and pre-trained text-to-image (T2I) models, namely OpenLEAF. In OpenLEAF, the LLM generates textual descriptions, coordinates T2I models, creates visual prompts for generating images, and incorporates global contexts into the T2I models. This global context improves the entity and style consistencies of images in the interleaved generation. For model assessment, we first propose to use large multi-modal models (LMMs) to evaluate the entity and style consistencies of open-domain interleaved image-text sequences. According to the LMM evaluation on our constructed evaluation set, the proposed interleaved generation framework can generate high-quality image-text content for various domains and applications, such as how-to question answering, storytelling, graphical story rewriting, and webpage/poster generation tasks. Moreover, we validate the effectiveness of the proposed LMM evaluation technique with human assessment. We hope our proposed framework, benchmark, and LMM evaluation could help establish the intriguing interleaved image-text generation task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes