Rho-Perfect: Correlation Ceiling For Subjective Evaluation Datasets

arXiv:2602.08552v11.4

Originality Incremental advance

AI Analysis

This work addresses the reliability issue in subjective evaluation datasets for researchers and practitioners, though it is incremental as it builds on existing noise models.

The paper tackles the problem of inherent noise in subjective ratings limiting model-human correlation by introducing Rho-Perfect, a method to estimate the highest achievable correlation on such datasets, and demonstrates its application on a speech quality dataset to differentiate between model limitations and data quality issues.

Subjective ratings contain inherent noise that limits the model-human correlation, but this reliability issue is rarely quantified. In this paper, we present $ρ$-Perfect, a practical estimation of the highest achievable correlation of a model on subjectively rated datasets. We define $ρ$-Perfect to be the correlation between a perfect predictor and human ratings, and derive an estimate of the value based on heteroscedastic noise scenarios, a common occurrence in subjectively rated datasets. We show that $ρ$-Perfect squared estimates test-retest correlation and use this to validate the estimate. We demonstrate the use of $ρ$-Perfect on a speech quality dataset and show how the measure can distinguish between model limitations and data quality issues.

View on arXiv PDF

Similar