CL CYAug 2, 2024

The Mismeasure of Man and Models: Evaluating Allocational Harms in Large Language Models

arXiv:2408.01285v12 citationsh-index: 5

Originality Incremental advance

AI Analysis

This work addresses the gap in bias measurement for LLMs used in allocation decisions, such as recruitment and clinical settings, which is crucial for mitigating harms in resource-constrained contexts.

The paper tackled the problem of measuring bias in large language models (LLMs) for high-stakes decision-making by introducing the Rank-Allocational-Based Bias Index (RABBI), which assesses allocational harms and shows a strong correlation with allocation disparities, unlike existing metrics that fail to reliably capture group disparities.

Large language models (LLMs) are now being considered and even deployed for applications that support high-stakes decision-making, such as recruitment and clinical decisions. While several methods have been proposed for measuring bias, there remains a gap between predictions, which are what the proposed methods consider, and how they are used to make decisions. In this work, we introduce Rank-Allocational-Based Bias Index (RABBI), a model-agnostic bias measure that assesses potential allocational harms arising from biases in LLM predictions. We compare RABBI and current bias metrics on two allocation decision tasks. We evaluate their predictive validity across ten LLMs and utility for model selection. Our results reveal that commonly-used bias metrics based on average performance gap and distribution distance fail to reliably capture group disparities in allocation outcomes, whereas RABBI exhibits a strong correlation with allocation disparities. Our work highlights the need to account for how models are used in contexts with limited resource constraints.

View on arXiv PDF

Similar