CVNov 25, 2025

RubricRL: Simple Generalizable Rewards for Text-to-Image Generation

Xuelu Feng, Yunsheng Li, Ziyu Wan, Zixuan Gao, Junsong Yuan, Dongdong Chen, Chunming Qiao

arXiv:2511.20651v13 citations

Originality Incremental advance

AI Analysis

This addresses the problem of limited interpretability and flexibility in reward design for text-to-image generation, offering a more user-controllable approach, though it appears incremental as it builds on existing RL alignment methods.

The paper tackles the challenge of designing interpretable rewards for aligning text-to-image models with human preferences by proposing RubricRL, a rubric-based framework that dynamically constructs decomposable checklists of visual criteria tailored to each prompt, resulting in improved prompt faithfulness and visual detail.

Reinforcement learning (RL) has recently emerged as a promising approach for aligning text-to-image generative models with human preferences. A key challenge, however, lies in designing effective and interpretable rewards. Existing methods often rely on either composite metrics (e.g., CLIP, OCR, and realism scores) with fixed weights or a single scalar reward distilled from human preference models, which can limit interpretability and flexibility. We propose RubricRL, a simple and general framework for rubric-based reward design that offers greater interpretability, composability, and user control. Instead of using a black-box scalar signal, RubricRL dynamically constructs a structured rubric for each prompt--a decomposable checklist of fine-grained visual criteria such as object correctness, attribute accuracy, OCR fidelity, and realism--tailored to the input text. Each criterion is independently evaluated by a multimodal judge (e.g., o4-mini), and a prompt-adaptive weighting mechanism emphasizes the most relevant dimensions. This design not only produces interpretable and modular supervision signals for policy optimization (e.g., GRPO or PPO), but also enables users to directly adjust which aspects to reward or penalize. Experiments with an autoregressive text-to-image model demonstrate that RubricRL improves prompt faithfulness, visual detail, and generalizability, while offering a flexible and extensible foundation for interpretable RL alignment across text-to-image architectures.

View on arXiv PDF

Similar