Discovering Bias Associations through Open-Ended LLM Generations
This addresses the issue of representational harms in AI for researchers and practitioners by providing a scalable tool to uncover unexpected biases, though it is incremental as it builds on existing bias evaluation methods.
The paper tackles the problem of identifying subtle social biases in Large Language Models (LLMs) by developing the Bias Association Discovery Framework (BADF), which extracts both known and new associations between demographic identities and descriptive concepts from open-ended outputs, enabling robust mapping across multiple models and contexts.
Social biases embedded in Large Language Models (LLMs) raise critical concerns, resulting in representational harms -- unfair or distorted portrayals of demographic groups -- that may be expressed in subtle ways through generated language. Existing evaluation methods often depend on predefined identity-concept associations, limiting their ability to surface new or unexpected forms of bias. In this work, we present the Bias Association Discovery Framework (BADF), a systematic approach for extracting both known and previously unrecognized associations between demographic identities and descriptive concepts from open-ended LLM outputs. Through comprehensive experiments spanning multiple models and diverse real-world contexts, BADF enables robust mapping and analysis of the varied concepts that characterize demographic identities. Our findings advance the understanding of biases in open-ended generation and provide a scalable tool for identifying and analyzing bias associations in LLMs. Data, code, and results are available at https://github.com/JP-25/Discover-Open-Ended-Generation