CLCVDec 11, 2024

Fast Prompt Alignment for Text-to-Image Generation

arXiv:2412.08639v15 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of inefficient prompt alignment for users in real-time, high-demand text-to-image generation settings, offering a scalable alternative to iterative methods.

The paper tackles the challenge of aligning complex textual prompts with generated visuals in text-to-image generation by introducing Fast Prompt Alignment (FPA), a prompt optimization framework that achieves competitive text-image alignment scores at a fraction of the processing time, as validated on datasets like COCO Captions and PartiPrompts.

Text-to-image generation has advanced rapidly, yet aligning complex textual prompts with generated visuals remains challenging, especially with intricate object relationships and fine-grained details. This paper introduces Fast Prompt Alignment (FPA), a prompt optimization framework that leverages a one-pass approach, enhancing text-to-image alignment efficiency without the iterative overhead typical of current methods like OPT2I. FPA uses large language models (LLMs) for single-iteration prompt paraphrasing, followed by fine-tuning or in-context learning with optimized prompts to enable real-time inference, reducing computational demands while preserving alignment fidelity. Extensive evaluations on the COCO Captions and PartiPrompts datasets demonstrate that FPA achieves competitive text-image alignment scores at a fraction of the processing time, as validated through both automated metrics (TIFA, VQA) and human evaluation. A human study with expert annotators further reveals a strong correlation between human alignment judgments and automated scores, underscoring the robustness of FPA's improvements. The proposed method showcases a scalable, efficient alternative to iterative prompt optimization, enabling broader applicability in real-time, high-demand settings. The codebase is provided to facilitate further research: https://github.com/tiktok/fast_prompt_alignment

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes