CV LGOct 29, 2025

MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency

Nicolas Dufour, Lucas Degeorge, Arijit Ghosh, Vicky Kalogeiton, David Picard

arXiv:2510.25897v11 citationsh-index: 10

Originality Incremental advance

AI Analysis

This addresses the challenge of improving image quality and efficiency in text-to-image generation for users, though it appears incremental as it builds on existing reward-based methods.

The paper tackles the problem of aligning text-to-image models with user preferences without harming diversity or efficiency by conditioning on multiple reward models during training, resulting in state-of-the-art performance on benchmarks and improved training speed.

Current text-to-image generative models are trained on large uncurated datasets to enable diverse generation capabilities. However, this does not align well with user preferences. Recently, reward models have been specifically designed to perform post-hoc selection of generated images and align them to a reward, typically user preference. This discarding of informative data together with the optimizing for a single reward tend to harm diversity, semantic fidelity and efficiency. Instead of this post-processing, we propose to condition the model on multiple reward models during training to let the model learn user preferences directly. We show that this not only dramatically improves the visual quality of the generated images but it also significantly speeds up the training. Our proposed method, called MIRO, achieves state-of-the-art performances on the GenEval compositional benchmark and user-preference scores (PickAScore, ImageReward, HPSv2).

View on arXiv PDF

Similar