HC AI CYOct 23, 2025

Race and Gender in LLM-Generated Personas: A Large-Scale Audit of 41 Occupations

Ilona van der Linden, Sahana Kumar, Arnav Dixit, Aadi Sudan, Smruthi Danda, David C. Anastasiu, Kai Lukoff

arXiv:2510.21011v14.1h-index: 21

Originality Incremental advance

AI Analysis

This work addresses biases in AI-generated occupational portrayals, which can impact visibility and fairness in applications like hiring or media, though it is incremental as it builds on existing audit methods.

The study audited over 1.5 million occupational personas generated by four large language models across 41 U.S. occupations, finding systematic shifts and stereotype exaggerations in race and gender representations compared to real-world data, such as White workers underrepresented by 31 percentage points and Hispanic workers overrepresented by 17 percentage points.

Generative AI tools are increasingly used to create portrayals of people in occupations, raising concerns about how race and gender are represented. We conducted a large-scale audit of over 1.5 million occupational personas across 41 U.S. occupations, generated by four large language models with different AI safety commitments and countries of origin (U.S., China, France). Compared with Bureau of Labor Statistics data, we find two recurring patterns: systematic shifts, where some groups are consistently under- or overrepresented, and stereotype exaggeration, where existing demographic skews are amplified. On average, White (--31pp) and Black (--9pp) workers are underrepresented, while Hispanic (+17pp) and Asian (+12pp) workers are overrepresented. These distortions can be extreme: for example, across all four models, Housekeepers are portrayed as nearly 100\% Hispanic, while Black workers are erased from many occupations. For HCI, these findings show provider choice materially changes who is visible, motivating model-specific audits and accountable design practices.

View on arXiv PDF

Similar