Structured Reasoning for Fairness: A Multi-Agent Approach to Bias Detection in Textual Data
This addresses bias detection for improving fairness and accountability in language models, though it appears incremental as it builds on existing datasets and methods.
The paper tackles the problem of textual bias in large language models by proposing a multi-agent framework that systematically identifies biases through fact/opinion disentanglement, bias intensity scoring, and factual justifications. On 1,500 samples from the WikiNPOV dataset, it achieves 84.9% accuracy, a 13.0% improvement over a zero-shot baseline.
From disinformation spread by AI chatbots to AI recommendations that inadvertently reinforce stereotypes, textual bias poses a significant challenge to the trustworthiness of large language models (LLMs). In this paper, we propose a multi-agent framework that systematically identifies biases by disentangling each statement as fact or opinion, assigning a bias intensity score, and providing concise, factual justifications. Evaluated on 1,500 samples from the WikiNPOV dataset, the framework achieves 84.9% accuracy$\unicode{x2014}$an improvement of 13.0% over the zero-shot baseline$\unicode{x2014}$demonstrating the efficacy of explicitly modeling fact versus opinion prior to quantifying bias intensity. By combining enhanced detection accuracy with interpretable explanations, this approach sets a foundation for promoting fairness and accountability in modern language models.