AI CLOct 24, 2024

Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

Chris Yuhao Liu, Liang Zeng, Jiacai Liu, Rui Yan, Jujie He, Chaojie Wang, Shuicheng Yan, Yang Liu, Yahui Zhou

arXiv:2410.18451v1299 citationsh-index: 14Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient preference learning for LLM developers, though it appears incremental as it builds on existing reward modeling approaches with optimized data curation.

The researchers tackled the problem of improving reward modeling for large language models by developing data-centric techniques for curating high-quality preference datasets, resulting in the Skywork-Reward collection with only 80K pairs that enabled models to achieve top performance on the RewardBench leaderboard.

In this report, we introduce a collection of methods to enhance reward modeling for LLMs, focusing specifically on data-centric techniques. We propose effective data selection and filtering strategies for curating high-quality open-source preference datasets, culminating in the Skywork-Reward data collection, which contains only 80K preference pairs -- significantly smaller than existing datasets. Using this curated dataset, we developed the Skywork-Reward model series -- Skywork-Reward-Gemma-27B and Skywork-Reward-Llama-3.1-8B -- with the former currently holding the top position on the RewardBench leaderboard. Notably, our techniques and datasets have directly enhanced the performance of many top-ranked models on RewardBench, highlighting the practical impact of our contributions in real-world preference learning applications.

View on arXiv PDF

Similar