CLAICYLGFeb 3, 2024

Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes

arXiv:2402.01981v170 citationsh-index: 41
Originality Incremental advance
AI Analysis

This addresses bias mitigation for LLM users in a zero-shot, accessible manner, though it is incremental as it builds on existing recognition of biases.

The paper tackles the problem of harmful social biases in large language models (LLMs) by introducing zero-shot self-debiasing techniques, which significantly reduce stereotyping across nine social groups without requiring model modifications.

Large language models (LLMs) have shown remarkable advances in language generation and understanding but are also prone to exhibiting harmful social biases. While recognition of these behaviors has generated an abundance of bias mitigation techniques, most require modifications to the training data, model parameters, or decoding strategy, which may be infeasible without access to a trainable model. In this work, we leverage the zero-shot capabilities of LLMs to reduce stereotyping in a technique we introduce as zero-shot self-debiasing. With two approaches, self-debiasing via explanation and self-debiasing via reprompting, we show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups while relying only on the LLM itself and a simple prompt, with explanations correctly identifying invalid assumptions and reprompting delivering the greatest reductions in bias. We hope this work opens inquiry into other zero-shot techniques for bias mitigation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes