Prompt and Prejudice
It addresses biases in AI systems for responsible AI development, though it is incremental as it builds on existing auditing methodologies.
This paper investigates demographic biases in Large Language Models (LLMs) and Vision Language Models (VLMs) by appending first names to ethical decision-making tasks, revealing biases across thousands of moral scenarios using over 300 names. It introduces the Practical Scenarios Benchmark (PSB) to assess biases in everyday and practical applications like granting mortgages.
This paper investigates the impact of using first names in Large Language Models (LLMs) and Vision Language Models (VLMs), particularly when prompted with ethical decision-making tasks. We propose an approach that appends first names to ethically annotated text scenarios to reveal demographic biases in model outputs. Our study involves a curated list of more than 300 names representing diverse genders and ethnic backgrounds, tested across thousands of moral scenarios. Following the auditing methodologies from social sciences we propose a detailed analysis involving popular LLMs/VLMs to contribute to the field of responsible AI by emphasizing the importance of recognizing and mitigating biases in these systems. Furthermore, we introduce a novel benchmark, the Pratical Scenarios Benchmark (PSB), designed to assess the presence of biases involving gender or demographic prejudices in everyday decision-making scenarios as well as practical scenarios where an LLM might be used to make sensible decisions (e.g., granting mortgages or insurances). This benchmark allows for a comprehensive comparison of model behaviors across different demographic categories, highlighting the risks and biases that may arise in practical applications of LLMs and VLMs.