CLAILGDec 4, 2024

Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models

arXiv:2412.03537v12 citationsh-index: 23
Originality Incremental advance
AI Analysis

This work addresses fairness concerns for users deploying language models in real-world systems via prompt adaptation, showing that biases persist significantly, which is incremental by extending prior research from fine-tuning to prompting.

The study investigated whether gender biases in pre-trained language models transfer to models adapted via prompting, finding strong correlations (rho >= 0.94) between biases in pre-trained models and those in zero- and few-shot prompted models across tasks like pronoun co-reference resolution.

Large language models (LLMs) are increasingly being adapted to achieve task-specificity for deployment in real-world decision systems. Several previous works have investigated the bias transfer hypothesis (BTH) by studying the effect of the fine-tuning adaptation strategy on model fairness to find that fairness in pre-trained masked language models have limited effect on the fairness of models when adapted using fine-tuning. In this work, we expand the study of BTH to causal models under prompt adaptations, as prompting is an accessible, and compute-efficient way to deploy models in real-world systems. In contrast to previous works, we establish that intrinsic biases in pre-trained Mistral, Falcon and Llama models are strongly correlated (rho >= 0.94) with biases when the same models are zero- and few-shot prompted, using a pronoun co-reference resolution task. Further, we find that bias transfer remains strongly correlated even when LLMs are specifically prompted to exhibit fair or biased behavior (rho >= 0.92), and few-shot length and stereotypical composition are varied (rho >= 0.97). Our findings highlight the importance of ensuring fairness in pre-trained LLMs, especially when they are later used to perform downstream tasks via prompt adaptation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes