CL AI CYMar 6, 2024

Impoverished Language Technology: The Lack of (Social) Class in NLP

Amanda Cercas Curry, Zeerak Talat, Dirk Hovy

arXiv:2403.03874v123.982 citationsh-index: 23LREC

Originality Synthesis-oriented

AI Analysis

This work highlights a critical oversight in NLP technology development, potentially limiting its applicability and fairness for diverse socio-economic groups, and is incremental in addressing this gap.

The paper identifies a significant gap in NLP research, noting that only 20 papers mention socio-economic status, with most not engaging deeply with class beyond annotator demographics, and it proposes a definition of class for operationalization in future technologies.

Since Labov's (1964) foundational work on the social stratification of language, linguistics has dedicated concerted efforts towards understanding the relationships between socio-demographic factors and language production and perception. Despite the large body of evidence identifying significant relationships between socio-demographic factors and language production, relatively few of these factors have been investigated in the context of NLP technology. While age and gender are well covered, Labov's initial target, socio-economic class, is largely absent. We survey the existing Natural Language Processing (NLP) literature and find that only 20 papers even mention socio-economic status. However, the majority of those papers do not engage with class beyond collecting information of annotator-demographics. Given this research lacuna, we provide a definition of class that can be operationalised by NLP researchers, and argue for including socio-economic class in future language technologies.

View on arXiv PDF

Similar