CLNov 25, 2024

Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings

arXiv:2411.16527v211 citationsh-index: 3NoDaLiDa/Baltic-HLT
Originality Synthesis-oriented
AI Analysis

This work addresses the need for accessible bias descriptions in AI to communicate risks and encourage mitigation, though it is incremental as it applies existing social psychology concepts to LLMs.

The paper tackled the problem of bias in large language models by proposing bias profiles based on stereotype dimensions from social psychology to describe discriminatory properties intuitively. They investigated gender bias in contextual embeddings across contexts and layers, generating profiles for twelve LLMs to expose and visualize bias.

Large language models (LLMs) are the foundation of the current successes of artificial intelligence (AI), however, they are unavoidably biased. To effectively communicate the risks and encourage mitigation efforts these models need adequate and intuitive descriptions of their discriminatory properties, appropriate for all audiences of AI. We suggest bias profiles with respect to stereotype dimensions based on dictionaries from social psychology research. Along these dimensions we investigate gender bias in contextual embeddings, across contexts and layers, and generate stereotype profiles for twelve different LLMs, demonstrating their intuition and use case for exposing and visualizing bias.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes