CVMar 7

TIQA: Human-Aligned Text Quality Assessment in Generated Images

arXiv:2603.07119v1
Predicted impact top 57% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work tackles the problem of accurately assessing text rendering quality in generated images, which is crucial for improving text-to-image models for all users. It offers an incremental improvement in evaluation methodology.

The paper addresses the persistent issue of poor text rendering in text-to-image models by introducing Text-in-Image Quality Assessment (TIQA), a task that predicts human-aligned scalar quality scores for rendered text. They developed ANTIQA, a lightweight method that improves correlation with human scores by at least ~0.05 on TIQA-Crops and ~0.08 on TIQA-Images, and can improve human-rated text quality by +14% when used for selecting the best-of-5 generations.

Text rendering remains a persistent failure mode of modern text-to-image models (T2I), yet existing evaluations rely on OCR correctness or VLM-based judging procedures that are poorly aligned with perceptual text artifacts. We introduce Text-in-Image Quality Assessment (TIQA), a task that predicts a scalar quality score that matches human judgments of rendered-text fidelity within cropped text regions. We release two MOS-labeled datasets: TIQA-Crops (10k text crops) and TIQA-Images (1,500 images), spanning 20+ T2I models, including proprietary ones. We also propose ANTIQA, a lightweight method with text-specific biases, and show that it improves correlation with human scores over OCR confidence, VLM judges, and generic NR-IQA metrics by at least $\sim0.05$ on TIQA-Crops and $\sim0.08$ on TIQA-Images, as measured by PLCC. Finally, we show that TIQA models are valuable in downstream tasks: for example, selecting the best-of-5 generations with ANTIQA improves human-rated text quality by $+14\%$ on average, demonstrating practical value for filtering and reranking in generation pipelines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes