Calibrated Preference Learning: The Case of Label Ranking

Santo M. A. R. Thies, Viktor Bengs, Timo Kaufmann, Sebastian J. Vollmer, Eyke Hüllermeier

arXiv:2605.3044759.4h-index: 14

AI Analysis

This work addresses the problem of formally defining and evaluating calibration for probabilistic label ranking models, which is important for reliable decision-making in applications like RLHF reward modeling.

This paper formalizes calibration for probabilistic label ranking, which predicts a distribution over label orderings. It introduces a hierarchy of calibration notions for full, sub-rankings, and top-k rankings, proving their relationships. The authors found that popular label ranking models are often poorly calibrated, and calibration correlates strongly with benchmark accuracy in RLHF reward models.

Calibration, the alignment of predicted probabilities with true outcome frequencies, is essential for reliable decision-making. While extensively studied for classification and regression, calibration has not been formally addressed for probabilistic label ranking, where the goal is to predict a distribution over orderings of a label set. Naively treating rankings as classes ignores their structure and fails to capture important modalities such as pairwise and top-k predictions. We formalize calibration for label ranking and develop a hierarchy of notions covering full rankings, sub-rankings, and top-k rankings. We prove that full-rank calibration implies the others but not conversely, and sub-ranking and top-k calibration are incomparable. Empirically, we find popular label ranking models are often poorly calibrated, with substantial differences between sub-ranking and top-k metrics. Applying our framework to RLHF reward models, we find that calibration correlates strongly but not perfectly with benchmark accuracy, suggesting it captures a meaningful quality dimension beyond top-1 accuracy. These findings motivate future work on understanding the downstream effects of miscalibration and developing methods to correct it.

View on arXiv PDF

Similar