CLSIOct 21, 2022

Discovering Differences in the Representation of People using Contextualized Semantic Axes

Berkeley
arXiv:2210.12170v1293 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses the need for more accurate semantic analysis in social and temporal contexts, though it is incremental as it builds on existing embedding paradigms.

The paper tackled the problem of identifying semantic differences across contexts by extending static word embeddings to BERT embeddings with contextualized semantic axes, which mitigate issues like antonym proximity, and demonstrated this on datasets including occupations and extremist discussions, showing that references to women became more detestable over time.

A common paradigm for identifying semantic differences across social and temporal contexts is the use of static word embeddings and their distances. In particular, past work has compared embeddings against "semantic axes" that represent two opposing concepts. We extend this paradigm to BERT embeddings, and construct contextualized axes that mitigate the pitfall where antonyms have neighboring representations. We validate and demonstrate these axes on two people-centric datasets: occupations from Wikipedia, and multi-platform discussions in extremist, men's communities over fourteen years. In both studies, contextualized semantic axes can characterize differences among instances of the same word type. In the latter study, we show that references to women and the contexts around them have become more detestable over time.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes