CLMay 5, 2025

Colombian Waitresses y Jueces canadienses: Gender and Country Biases in Occupation Recommendations from LLMs

arXiv:2505.02456v23 citationsh-index: 10Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Originality Incremental advance
AI Analysis

This addresses fairness issues in NLP by highlighting the need for intersectional and multilingual approaches to bias mitigation, though it is incremental as it extends existing bias research to new dimensions.

The study measured gender and country biases in occupation recommendations from large language models across English, Spanish, and German, finding that intersectional biases persist even when individual biases are mitigated, with instruction-tuned models showing the lowest bias levels.

One of the goals of fairness research in NLP is to measure and mitigate stereotypical biases that are propagated by NLP systems. However, such work tends to focus on single axes of bias (most often gender) and the English language. Addressing these limitations, we contribute the first study of multilingual intersecting country and gender biases, with a focus on occupation recommendations generated by large language models. We construct a benchmark of prompts in English, Spanish and German, where we systematically vary country and gender, using 25 countries and four pronoun sets. Then, we evaluate a suite of 5 Llama-based models on this benchmark, finding that LLMs encode significant gender and country biases. Notably, we find that even when models show parity for gender or country individually, intersectional occupational biases based on both country and gender persist. We also show that the prompting language significantly affects bias, and instruction-tuned models consistently demonstrate the lowest and most stable levels of bias. Our findings highlight the need for fairness researchers to use intersectional and multilingual lenses in their work.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes