RBCorr: Response Bias Correction in Language Models
This addresses the issue of inaccurate evaluations of model abilities due to biases, enabling more reliable benchmarking for researchers and practitioners, though it is incremental as it builds on existing correction methods.
The paper tackled the problem of response biases in language models, which cause option preference biases in fixed-response questions, and proposed RBCorr, a simple correction strategy that effectively eliminates bias and boosts performance across 12 models on yes-no, entailment, and multiple choice questions.
Language models (LMs) are known to be prone to response biases, which present as option preference biases in fixed-response questions. It is therefore imperative to develop low-cost and effective response bias correction methods to improve LM performance and enable more accurate evaluations of model abilities. Here, we propose a simple response bias correction strategy ($\texttt{RBCorr}$) and test it on 12 open-weight language models using yes-no, entailment, and multiple choice questions. We show that response bias is prevalent in LMs pre-correction and that $\texttt{RBCorr}$ effectively eliminates bias and boosts model performance. We also explore the generalizability of bias behavior across models, datasets, and prompt formats, showing that LogProbs-based correction is highly dependent on all three of these aspects. Overall, $\texttt{RBCorr}$ is an easy-to-use method that can boost the performance of smaller LMs and ensure that LM performance on closed-response benchmarks aligns more closely with their true capabilities.