AI LGMar 9

CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling

Dengcan Liu, Fengkai Yang, Xiaohan Wang, Shurui Yan, Jiajun Chai, Jiahao Li, Yikun Ban, Zhendong Mao, Wei Lin, Guojun Yin

arXiv:2603.08035v123.97 citations

Predicted impact top 5% in AI · last 90 daysOriginality Highly original

AI Analysis

This work provides a more scalable, interpretable, and data-efficient method for reward modeling, which is crucial for aligning LLMs with human preferences, especially for practitioners and researchers dealing with the high cost of expert annotations.

This paper introduces CDRRM, a framework for generating high-quality, interpretable rubrics for reward modeling in LLMs. It uses a Contrast-then-Synthesis paradigm to identify discriminative factors and synthesize them into compact rubrics, achieving state-of-the-art performance on three benchmarks and mitigating evaluation biases. Notably, CDRRM demonstrates exceptional data efficiency, outperforming fully fine-tuned baselines with only 3k training samples for its rubric generator.

Reward modeling is essential for aligning Large Language Models(LLMs) with human preferences, yet conventional reward models suffer from poor interpretability and heavy reliance on costly expert annotations. While recent rubric-based approaches enhance evaluation transparency, they lack systematic quality control, yielding noisy and redundant criteria, failing to mitigate persistent biases (e.g., verbosity, position) in LLM evaluators, and creating a scalability-reliability trade-off. To address these limitations, we propose CDRRM (Contrast-Driven Rubric Reward Model), a framework built on a novel Contrast-then-Synthesis paradigm for high-quality rubric generation and guided preference judgment. CDRRM first conducts multi-dimensional contrastive profiling on preference pairs to identify causal discriminative factors, then synthesizes these insights into compact, context-aware rubrics to guide preference judg- ments. Extensive experiments on three authoritative benchmarks (RewardBench, RMBench, RMB) demonstrate that CDRRM achieves state-of-the-art performance across diverse domains and effectively mitigates aforementioned evaluation biases. Notably, our approach delivers exceptional data efficiency: training the rubric generator on only 3k high-quality samples empowers a frozen pre-trained judge model to outperform fully fine-tuned baselines. This work offers a scalable, interpretable, and data-efficient path for reward modeling.

View on arXiv PDF

Similar