CLAIJun 17, 2025

GRAM: A Generative Foundation Reward Model for Reward Generalization

arXiv:2506.14175v222 citationsh-index: 13ICML
Originality Incremental advance
AI Analysis

This work addresses the challenge of reward generalization in AI alignment, offering a foundation model that reduces the need for task-specific fine-tuning, though it is incremental in building on existing generative and discriminative methods.

The paper tackles the problem of aligning large language models by developing a generative reward model trained with both unlabeled and labeled data, achieving significant performance improvements across tasks like response ranking and reinforcement learning from human feedback.

In aligning large language models (LLMs), reward models have played an important role, but are standardly trained as discriminative models and rely only on labeled human preference data. In this paper, we explore methods that train reward models using both unlabeled and labeled data. Building on the generative models in LLMs, we develop a generative reward model that is first trained via large-scale unsupervised learning and then fine-tuned via supervised learning. We also show that by using label smoothing, we are in fact optimizing a regularized pairwise ranking loss. This result, in turn, provides a new view of training reward models, which links generative models and discriminative models under the same class of training objectives. The outcome of these techniques is a foundation reward model, which can be applied to a wide range of tasks with little or no further fine-tuning effort. Extensive experiments show that this model generalizes well across several tasks, including response ranking, reinforcement learning from human feedback, and task adaptation with fine-tuning, achieving significant performance improvements over several strong baseline models.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes