CLMay 26, 2023

Nichelle and Nancy: The Influence of Demographic Attributes and Tokenization Length on First Name Biases

arXiv:2305.16577v1224 citations
Originality Incremental advance
AI Analysis

This work addresses biases in AI models for social reasoning, but it is incremental as it builds on prior substitution experiments by controlling for confounding factors.

The study investigated how demographic attributes and tokenization length of first names influence biases in social commonsense reasoning models, finding that both factors systematically affect model behavior.

Through the use of first name substitution experiments, prior research has demonstrated the tendency of social commonsense reasoning models to systematically exhibit social biases along the dimensions of race, ethnicity, and gender (An et al., 2023). Demographic attributes of first names, however, are strongly correlated with corpus frequency and tokenization length, which may influence model behavior independent of or in addition to demographic factors. In this paper, we conduct a new series of first name substitution experiments that measures the influence of these factors while controlling for the others. We find that demographic attributes of a name (race, ethnicity, and gender) and name tokenization length are both factors that systematically affect the behavior of social commonsense reasoning models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes