Small Changes, Big Impact: Demographic Bias in LLM-Based Hiring Through Subtle Sociocultural Markers in Anonymised Resumes
For hiring fairness practitioners, this reveals that current anonymization practices are insufficient to prevent LLM bias, as job-irrelevant markers still skew outcomes.
LLMs exhibit systematic demographic bias in resume screening even when explicit PII is redacted, with subtle sociocultural markers (languages, hobbies) enabling high-accuracy recovery of ethnicity and gender, favoring Chinese and Caucasian males. Prompting for explanations can paradoxically amplify bias.
Large Language Models (LLMs) are increasingly deployed in resume screening pipelines. Although explicit PII (e.g., names) is commonly redacted, resumes typically retain subtle sociocultural markers (languages, co-curricular activities, volunteering, hobbies) that can act as demographic proxies. We introduce a generalisable stress-test framework for hiring fairness instantiated in the Singapore context: 100 neutral job-aligned resumes are augmented into 4100 variants spanning four ethnicities and two genders, differing only in job-irrelevant markers. We evaluate 18 LLMs in two settings: (i) Direct Comparison (1v1) and (ii) Score & Shortlist (Top-Score Rates), each with and without rationale prompting. We find that even without explicit identifiers, models recover demographic attributes with high F1 and exhibit systematic disparities, with models favouring markers associated with Chinese and Caucasian males. Ablations show language markers suffice for inferring ethnicity, while hobbies and activities are utilised for gender. Furthermore, prompting for explanations may paradoxically amplify bias. Our findings suggest that seemingly innocuous markers surviving anonymisation can materially skew automated hiring outcomes.