CLCYFeb 14, 2023

BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models

UW
arXiv:2302.07371v311 citationsh-index: 78Has Code
AI Analysis

This work addresses the cumbersome and limited process of social bias testing for language models, enabling domain experts to conduct more effective and scalable bias detection, though it is incremental as it builds on existing bias testing methods by automating sentence generation.

The authors tackled the problem of detecting social biases in pretrained language models by using ChatGPT to automatically generate diverse test sentences, which outperformed template-based methods, especially for intersectional biases, and improved domain experts' awareness of biases in PLMs.

Pretrained Language Models (PLMs) harbor inherent social biases that can result in harmful real-world implications. Such social biases are measured through the probability values that PLMs output for different social groups and attributes appearing in a set of test sentences. However, bias testing is currently cumbersome since the test sentences are generated either from a limited set of manual templates or need expensive crowd-sourcing. We instead propose using ChatGPT for the controllable generation of test sentences, given any arbitrary user-specified combination of social groups and attributes appearing in the test sentences. When compared to template-based methods, our approach using ChatGPT for test sentence generation is superior in detecting social bias, especially in challenging settings such as intersectional biases. We present an open-source comprehensive bias testing framework (BiasTestGPT), hosted on HuggingFace, that can be plugged into any open-source PLM for bias testing. User testing with domain experts from various fields has shown their interest in being able to test modern AI for social biases. Our tool has significantly improved their awareness of such biases in PLMs, proving to be learnable and user-friendly. We thus enable seamless open-ended social bias testing of PLMs by domain experts through an automatic large-scale generation of diverse test sentences for any combination of social categories and attributes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes