CV LGApr 12, 2023

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, Yuxiao Dong

Tsinghua

arXiv:2304.05977v455.11156 citationsh-index: 47Has Code

Originality Highly original

AI Analysis

This work addresses the challenge of evaluating and improving text-to-image models for better alignment with human preferences, representing a significant but incremental advance in the field.

The authors tackled the problem of aligning text-to-image generation with human preferences by introducing ImageReward, a reward model trained on 137k expert comparisons, which outperforms existing metrics in human evaluation, and ReFL, a tuning algorithm that improves diffusion models based on this scorer.

We present a comprehensive solution to learn and improve text-to-image models from human preference feedback. To begin with, we build ImageReward -- the first general-purpose text-to-image human preference reward model -- to effectively encode human preferences. Its training is based on our systematic annotation pipeline including rating and ranking, which collects 137k expert comparisons to date. In human evaluation, ImageReward outperforms existing scoring models and metrics, making it a promising automatic metric for evaluating text-to-image synthesis. On top of it, we propose Reward Feedback Learning (ReFL), a direct tuning algorithm to optimize diffusion models against a scorer. Both automatic and human evaluation support ReFL's advantages over compared methods. All code and datasets are provided at \url{https://github.com/THUDM/ImageReward}.

View on arXiv PDF Code

Similar