CYCLMay 7, 2024

The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring

arXiv:2405.04412v355 citationsh-index: 13EAAMO
Originality Incremental advance
AI Analysis

This addresses fairness concerns in AI-driven hiring for employers and job seekers, but it is incremental as it builds on existing bias literature.

The study audited GPT-3.5 for race and gender biases in hiring, finding that it reflected stereotypes in resume assessments and generated resumes with biased markers like less experience for women and immigrant indicators for Asian and Hispanic names.

Large language models (LLMs) are increasingly being introduced in workplace settings, with the goals of improving efficiency and fairness. However, concerns have arisen regarding these models' potential to reflect or exacerbate social biases and stereotypes. This study explores the potential impact of LLMs on hiring practices. To do so, we conduct an AI audit of race and gender biases in one commonly-used LLM, OpenAI's GPT-3.5, taking inspiration from the history of traditional offline resume audits. We conduct two studies using names with varied race and gender connotations: resume assessment (Study 1) and resume generation (Study 2). In Study 1, we ask GPT to score resumes with 32 different names (4 names for each combination of the 2 gender and 4 racial groups) and two anonymous options across 10 occupations and 3 evaluation tasks (overall rating, willingness to interview, and hireability). We find that the model reflects some biases based on stereotypes. In Study 2, we prompt GPT to create resumes (10 for each name) for fictitious job candidates. When generating resumes, GPT reveals underlying biases; women's resumes had occupations with less experience, while Asian and Hispanic resumes had immigrant markers, such as non-native English and non-U.S. education and work experiences. Our findings contribute to a growing body of literature on LLM biases, particularly in workplace contexts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes