CLJun 3, 2023

COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements

Xuhui Zhou, Hao Zhu, Akhila Yerukola, Thomas Davidson, Jena D. Hwang, Swabha Swayamdipta, Maarten Sap

AI2CMU

arXiv:2306.01985v227.8239 citationsh-index: 49

Originality Highly original

AI Analysis

This addresses the issue of toxic language detection for NLP applications by incorporating contextual reasoning, representing a novel approach rather than an incremental improvement.

The paper tackles the problem of detecting offensive language by introducing COBRA frames, a context-aware formalism that explains the intents, reactions, and harms of statements based on social and situational context, and shows that context-aware models outperform context-agnostic ones with a 29% accuracy drop in cases where context inverts offensiveness.

Warning: This paper contains content that may be offensive or upsetting. Understanding the harms and offensiveness of statements requires reasoning about the social and situational context in which statements are made. For example, the utterance "your English is very good" may implicitly signal an insult when uttered by a white man to a non-white colleague, but uttered by an ESL teacher to their student would be interpreted as a genuine compliment. Such contextual factors have been largely ignored by previous approaches to toxic language detection. We introduce COBRA frames, the first context-aware formalism for explaining the intents, reactions, and harms of offensive or biased statements grounded in their social and situational context. We create COBRACORPUS, a dataset of 33k potentially offensive statements paired with machine-generated contexts and free-text explanations of offensiveness, implied biases, speaker intents, and listener reactions. To study the contextual dynamics of offensiveness, we train models to generate COBRA explanations, with and without access to the context. We find that explanations by context-agnostic models are significantly worse than by context-aware ones, especially in situations where the context inverts the statement's offensiveness (29% accuracy drop). Our work highlights the importance and feasibility of contextualized NLP by modeling social factors.

View on arXiv PDF

Similar