A Taxonomy of Stereotype Content in Large Language Models
This work addresses the problem of understanding and mitigating bias in AI systems for developers and auditors, though it is incremental as it builds on existing human stereotype research.
The study introduced a taxonomy of stereotype content in large language models (LLMs), identifying 14 dimensions that account for about 90% of stereotype associations, with warmth and competence being most frequent, and found that stereotypes in LLMs are more positive than in humans but vary across categories and dimensions.
This study introduces a taxonomy of stereotype content in contemporary large language models (LLMs). We prompt ChatGPT 3.5, Llama 3, and Mixtral 8x7B, three powerful and widely used LLMs, for the characteristics associated with 87 social categories (e.g., gender, race, occupations). We identify 14 stereotype dimensions (e.g., Morality, Ability, Health, Beliefs, Emotions), accounting for ~90% of LLM stereotype associations. Warmth and Competence facets were the most frequent content, but all other dimensions were significantly prevalent. Stereotypes were more positive in LLMs (vs. humans), but there was significant variability across categories and dimensions. Finally, the taxonomy predicted the LLMs' internal evaluations of social categories (e.g., how positively/negatively the categories were represented), supporting the relevance of a multidimensional taxonomy for characterizing LLM stereotypes. Our findings suggest that high-dimensional human stereotypes are reflected in LLMs and must be considered in AI auditing and debiasing to minimize unidentified harms from reliance in low-dimensional views of bias in LLMs.