CL AIJun 17, 2025

GRAM: A Generative Foundation Reward Model for Reward Generalization

Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Qiaozhi He, Murun Yang, Bei Li, Tong Xiao, Chunliang Zhang, Tongran Liu, Jingbo Zhu

arXiv:2506.14175v223.022 citationsh-index: 13Has CodeICML

Originality Incremental advance

AI Analysis

This work addresses the challenge of reward generalization in AI alignment, offering a foundation model that reduces the need for task-specific fine-tuning, though it is incremental in building on existing generative and discriminative methods.

The paper tackles the problem of aligning large language models by developing a generative reward model trained with both unlabeled and labeled data, achieving significant performance improvements across tasks like response ranking and reinforcement learning from human feedback.

In aligning large language models (LLMs), reward models have played an important role, but are standardly trained as discriminative models and rely only on labeled human preference data. In this paper, we explore methods that train reward models using both unlabeled and labeled data. Building on the generative models in LLMs, we develop a generative reward model that is first trained via large-scale unsupervised learning and then fine-tuned via supervised learning. We also show that by using label smoothing, we are in fact optimizing a regularized pairwise ranking loss. This result, in turn, provides a new view of training reward models, which links generative models and discriminative models under the same class of training objectives. The outcome of these techniques is a foundation reward model, which can be applied to a wide range of tasks with little or no further fine-tuning effort. Extensive experiments show that this model generalizes well across several tasks, including response ranking, reinforcement learning from human feedback, and task adaptation with fine-tuning, achieving significant performance improvements over several strong baseline models.

View on arXiv PDF Code

Similar