A Scalable Entity-Based Framework for Auditing Bias in LLMs
For researchers and practitioners deploying LLMs in high-stakes applications, this work provides a rigorous, scalable method to audit biases and reveals systematic disparities that need to be addressed.
This paper introduces a scalable bias-auditing framework using named entities as controlled probes, conducting the largest bias audit to date with 1.9 billion data points. It finds consistent biases in LLMs, including political, geographic, and corporate preferences, with instruction tuning reducing bias but model scale amplifying it.
Existing approaches to bias evaluation in large language models (LLMs) trade ecological validity for statistical control, relying either on artificial prompts that poorly reflect real-world use or on naturalistic tasks that lack scale and rigor. We introduce a scalable bias-auditing framework that uses named entities as controlled probes to measure systematic disparities in model behavior. Synthetic data enables us to construct diverse, controlled inputs, and we show that it reliably reproduces bias patterns observed in natural text, supporting its use for large-scale analysis. Using this framework, we conduct the largest bias audit to date, comprising 1.9 billion data points across multiple entity types, tasks, languages, models, and prompting strategies. We find consistent patterns: models penalize right-wing politicians and favor left-wing politicians, prefer Western and wealthier countries over the Global South, favor Western companies, and penalize firms in the defense and pharmaceutical sectors. While instruction tuning reduces bias, increasing model scale amplifies it, and prompting in Chinese or Russian does not mitigate Western-aligned preferences. These findings highlight the need for systematic bias auditing before deploying LLMs in high-stakes applications. Our framework is extensible to other domains and tasks, and we make it publicly available to support future work.