CLOct 13, 2024

Evaluating Gender Bias of LLMs in Making Morality Judgements

arXiv:2410.09992v126 citationsh-index: 6Has CodeEMNLP
Originality Synthesis-oriented
AI Analysis

It addresses the problem of social biases in AI for users relying on LLMs for ethical decisions, but it is incremental as it evaluates existing models on a new dataset.

This work investigates gender bias in large language models when making moral judgments, finding that all tested models, including GPT and Llama families, display significant bias, with GPT-3.5-turbo showing biased opinions in 24% of samples and models consistently favoring female characters in 68-85% of cases.

Large Language Models (LLMs) have shown remarkable capabilities in a multitude of Natural Language Processing (NLP) tasks. However, these models are still not immune to limitations such as social biases, especially gender bias. This work investigates whether current closed and open-source LLMs possess gender bias, especially when asked to give moral opinions. To evaluate these models, we curate and introduce a new dataset GenMO (Gender-bias in Morality Opinions) comprising parallel short stories featuring male and female characters respectively. Specifically, we test models from the GPT family (GPT-3.5-turbo, GPT-3.5-turbo-instruct, GPT-4-turbo), Llama 3 and 3.1 families (8B/70B), Mistral-7B and Claude 3 families (Sonnet and Opus). Surprisingly, despite employing safety checks, all production-standard models we tested display significant gender bias with GPT-3.5-turbo giving biased opinions in 24% of the samples. Additionally, all models consistently favour female characters, with GPT showing bias in 68-85% of cases and Llama 3 in around 81-85% instances. Additionally, our study investigates the impact of model parameters on gender bias and explores real-world situations where LLMs reveal biases in moral decision-making.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes