CVJul 25, 2025

Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment

arXiv:2507.19002v111 citationsh-index: 9Has Code
Originality Incremental advance
AI Analysis

This work addresses a critical bottleneck in evaluating and optimizing high-quality image generation systems for AI and creative applications, though it is incremental as it builds on existing CLIP and BLIP architectures.

The paper tackled the problem that existing reward models for image generation fail to accurately assess images with high aesthetic quality, by introducing a new evaluation score (ICT) and a high-preference score model (HP), which improved scoring accuracy by over 10% compared to previous methods.

Contemporary image generation systems have achieved high fidelity and superior aesthetic quality beyond basic text-image alignment. However, existing evaluation frameworks have failed to evolve in parallel. This study reveals that human preference reward models fine-tuned based on CLIP and BLIP architectures have inherent flaws: they inappropriately assign low scores to images with rich details and high aesthetic value, creating a significant discrepancy with actual human aesthetic preferences. To address this issue, we design a novel evaluation score, ICT (Image-Contained-Text) score, that achieves and surpasses the objectives of text-image alignment by assessing the degree to which images represent textual content. Building upon this foundation, we further train an HP (High-Preference) score model using solely the image modality to enhance image aesthetics and detail quality while maintaining text-image alignment. Experiments demonstrate that the proposed evaluation model improves scoring accuracy by over 10\% compared to existing methods, and achieves significant results in optimizing state-of-the-art text-to-image models. This research provides theoretical and empirical support for evolving image generation technology toward higher-order human aesthetic preferences. Code is available at https://github.com/BarretBa/ICTHP.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes