CLAICVJun 10, 2025

Re-Thinking the Automatic Evaluation of Image-Text Alignment in Text-to-Image Models

arXiv:2506.08480v1h-index: 3
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of unreliable evaluation in text-to-image generation for researchers and practitioners, but it is incremental as it critiques existing methods without introducing a new paradigm.

The paper identifies shortcomings in current evaluation frameworks for text-to-image alignment, showing they fail to meet key reliability properties across various metrics and models, and proposes recommendations for improvement.

Text-to-image models often struggle to generate images that precisely match textual prompts. Prior research has extensively studied the evaluation of image-text alignment in text-to-image generation. However, existing evaluations primarily focus on agreement with human assessments, neglecting other critical properties of a trustworthy evaluation framework. In this work, we first identify two key aspects that a reliable evaluation should address. We then empirically demonstrate that current mainstream evaluation frameworks fail to fully satisfy these properties across a diverse range of metrics and models. Finally, we propose recommendations for improving image-text alignment evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes