CLFeb 6, 2025

My LLM might Mimic AAE -- But When Should it?

Sandra C. Sandoval, Christabel Acquaye, Kwesi Cobbina, Mohammad Nayeem Teli, Hal Daumé

arXiv:2502.04564v22.74 citationsh-index: 7Has CodeNAACL

Originality Synthesis-oriented

AI Analysis

This research addresses the representation of African American English in AI systems, providing insights into user preferences for cultural authenticity in language models, though it is incremental as it focuses on perceptions and contexts rather than developing new methods.

The study investigated how Black Americans perceive the authenticity of African American English (AAE) produced by large language models (LLMs) and their preferences for its use, finding that participants favor autonomy in deciding when AAE is appropriate, preferring it in informal settings and defaulting to Mainstream U.S. English in formal ones, with LLM outputs achieving authenticity comparable to real speech when properly prompted.

We examine the representation of African American English (AAE) in large language models (LLMs), exploring (a) the perceptions Black Americans have of how effective these technologies are at producing authentic AAE, and (b) in what contexts Black Americans find this desirable. Through both a survey of Black Americans ($n=$ 104) and annotation of LLM-produced AAE by Black Americans ($n=$ 228), we find that Black Americans favor choice and autonomy in determining when AAE is appropriate in LLM output. They tend to prefer that LLMs default to communicating in Mainstream U.S. English in formal settings, with greater interest in AAE production in less formal settings. When LLMs were appropriately prompted and provided in context examples, our participants found their outputs to have a level of AAE authenticity on par with transcripts of Black American speech. Select code and data for our project can be found here: https://github.com/smelliecat/AAEMime.git

View on arXiv PDF Code

Similar