CLLGSep 9, 2025

Bias after Prompting: Persistent Discrimination in Large Language Models

arXiv:2509.08146v23 citationsh-index: 23EMNLP
Originality Incremental advance
AI Analysis

This work addresses the persistent discrimination issue in LLMs for real-world applications, showing that current prompt-based methods are insufficient, which is incremental as it builds on prior bias transfer research.

The study tackled the problem of biases transferring from pre-trained large language models to adapted models through prompting, finding that biases persist strongly across demographics and tasks, with correlations such as rho >= 0.94 for gender in co-reference resolution, and popular debiasing strategies fail to consistently mitigate this transfer.

A dangerous assumption that can be made from prior work on the bias transfer hypothesis (BTH) is that biases do not transfer from pre-trained large language models (LLMs) to adapted models. We invalidate this assumption by studying the BTH in causal models under prompt adaptations, as prompting is an extremely popular and accessible adaptation strategy used in real-world applications. In contrast to prior work, we find that biases can transfer through prompting and that popular prompt-based mitigation methods do not consistently prevent biases from transferring. Specifically, the correlation between intrinsic biases and those after prompt adaptation remain moderate to strong across demographics and tasks -- for example, gender (rho >= 0.94) in co-reference resolution, and age (rho >= 0.98) and religion (rho >= 0.69) in question answering. Further, we find that biases remain strongly correlated when varying few-shot composition parameters, such as sample size, stereotypical content, occupational distribution and representational balance (rho >= 0.90). We evaluate several prompt-based debiasing strategies and find that different approaches have distinct strengths, but none consistently reduce bias transfer across models, tasks or demographics. These results demonstrate that correcting bias, and potentially improving reasoning ability, in intrinsic models may prevent propagation of biases to downstream tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes