LG AI CLOct 15, 2024

Bias Similarity Measurement: A Black-Box Audit of Fairness Across LLMs

Hyejun Jeong, Shiqing Ma, Amir Houmansadr

arXiv:2410.12010v46.42 citationsh-index: 5Has Code

Originality Highly original

AI Analysis

This provides a systematic auditing workflow for LLM ecosystems, addressing bias evaluation across models rather than in isolation, which is important for procurement and regression testing in AI development.

The researchers tackled the problem of measuring bias persistence across different LLMs by introducing Bias Similarity Measurement (BSM), which evaluates fairness as a relational property between models, and found that instruction tuning primarily enforces abstention rather than altering internal representations, with Gemma 3 Instruct matching GPT-4-level fairness at far lower cost.

Large Language Models (LLMs) reproduce social biases, yet prevailing evaluations score models in isolation, obscuring how biases persist across families and releases. We introduce Bias Similarity Measurement (BSM), which treats fairness as a relational property between models, unifying scalar, distributional, behavioral, and representational signals into a single similarity space. Evaluating 30 LLMs on 1M+ prompts, we find that instruction tuning primarily enforces abstention rather than altering internal representations; small models gain little accuracy and can become less fair under forced choice; and open-weight models can match or exceed proprietary systems. Family signatures diverge: Gemma favors refusal, LLaMA 3.1 approaches neutrality with fewer refusals, and converges toward abstention-heavy behavior overall. Counterintuitively, Gemma 3 Instruct matches GPT-4-level fairness at far lower cost, whereas Gemini's heavy abstention suppresses utility. Beyond these findings, BSM offers an auditing workflow for procurement, regression testing, and lineage screening, and extends naturally to code and multilingual settings. Our results reframe fairness not as isolated scores but as comparative bias similarity, enabling systematic auditing of LLM ecosystems. Code available at https://github.com/HyejunJeong/bias_llm.

View on arXiv PDF Code

Similar