CLAIFeb 24, 2025

Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch

arXiv:2502.17173v32 citationsh-index: 29Has CodeACL
Originality Incremental advance
AI Analysis

This addresses the problem of aligning LLMs with human preferences in Chinese contexts, which is incremental but important for non-English language applications.

The paper tackles the lack of reliable datasets and benchmarks for Chinese reward models (RMs) by introducing CheemsBench, a human-annotated evaluation benchmark, and CheemsPreference, a large-scale preference dataset for training. Their constructed RM achieves state-of-the-art performance on CheemsBench, showing that AI-generated data alone struggles to capture human preferences.

Reward models (RMs) are crucial for aligning large language models (LLMs) with human preferences. However, most RM research is centered on English and relies heavily on synthetic resources, which leads to limited and less reliable datasets and benchmarks for Chinese. To address this gap, we introduce CheemsBench, a fully human-annotated RM evaluation benchmark within Chinese contexts, and CheemsPreference, a large-scale and diverse preference dataset annotated through human-machine collaboration to support Chinese RM training. We systematically evaluate open-source discriminative and generative RMs on CheemsBench and observe significant limitations in their ability to capture human preferences in Chinese scenarios. Additionally, based on CheemsPreference, we construct an RM that achieves state-of-the-art performance on CheemsBench, demonstrating the necessity of human supervision in RM training. Our findings reveal that scaled AI-generated data struggles to fully capture human preferences, emphasizing the importance of high-quality human supervision in RM development.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes