AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling
This work addresses a key bottleneck in aligning AI systems with human preferences, though it appears incremental as it builds on existing reward modeling frameworks.
The paper tackled the problem of reward modeling for aligning large language models with human preferences by addressing limitations in static pooling strategies, resulting in AdaJudge, which outperformed strong baselines on RM-Bench and JudgeBench.
Reward modeling is essential for aligning large language models with human preferences, yet predominant architectures rely on a static pooling strategy to condense sequences into scalar scores. This paradigm, however, suffers from two key limitations: a static inductive bias that misaligns with task-dependent preference signals, and a representational mismatch, as the backbone is optimized for generation rather than fine-grained discrimination. To address this, we propose AdaJudge, a unified framework that jointly adapts representation and aggregation. AdaJudge first refines backbone representations into a discrimination-oriented space via gated refinement blocks. It then replaces the static readout with an adaptive multi-view pooling module that dynamically routes and combines evidence. Extensive experiments on RM-Bench and JudgeBench show that AdaJudge outperforms strong off-the-shelf reward models and traditional pooling baselines.