BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs
This addresses the crucial need for reliable bias evaluation in LLMs for developers and researchers, though it appears incremental as an improved tool for an existing problem.
The paper tackles the problem of detecting social bias in large language models' open-text generation, where existing methods fail due to reliance on fixed-form outputs. The proposed BiasAlert tool significantly outperforms state-of-the-art methods like GPT4-as-A-Judge in bias detection.
Evaluating the bias in Large Language Models (LLMs) becomes increasingly crucial with their rapid development. However, existing evaluation methods rely on fixed-form outputs and cannot adapt to the flexible open-text generation scenarios of LLMs (e.g., sentence completion and question answering). To address this, we introduce BiasAlert, a plug-and-play tool designed to detect social bias in open-text generations of LLMs. BiasAlert integrates external human knowledge with inherent reasoning capabilities to detect bias reliably. Extensive experiments demonstrate that BiasAlert significantly outperforms existing state-of-the-art methods like GPT4-as-A-Judge in detecting bias. Furthermore, through application studies, we demonstrate the utility of BiasAlert in reliable LLM bias evaluation and bias mitigation across various scenarios. Model and code will be publicly released.