CLApr 25, 2020

When do Word Embeddings Accurately Reflect Surveys on our Beliefs About People?

arXiv:2004.12043v11001 citations
AI Analysis

This work addresses the reliability of using word embeddings to study societal biases, which is crucial for researchers and practitioners in AI and social sciences to avoid misinterpretations in downstream applications.

The study investigated whether word embeddings accurately reflect social biases measured by traditional surveys, finding that on average they closely mirror survey data across seventeen dimensions, but with significant variation in accuracy between dimensions like gender and race.

Social biases are encoded in word embeddings. This presents a unique opportunity to study society historically and at scale, and a unique danger when embeddings are used in downstream applications. Here, we investigate the extent to which publicly-available word embeddings accurately reflect beliefs about certain kinds of people as measured via traditional survey methods. We find that biases found in word embeddings do, on average, closely mirror survey data across seventeen dimensions of social meaning. However, we also find that biases in embeddings are much more reflective of survey data for some dimensions of meaning (e.g. gender) than others (e.g. race), and that we can be highly confident that embedding-based measures reflect survey data only for the most salient biases.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes