CLApr 3, 2020

Directions in Abusive Language Training Data: Garbage In, Garbage Out

arXiv:2004.01670v311.1352 citations

Originality Synthesis-oriented

AI Analysis

It addresses the problem of inconsistent and low-quality training data for abusive language detection, which affects researchers and practitioners in NLP and online safety, but is incremental as it synthesizes existing knowledge.

The paper systematically reviews abusive language dataset creation and content, leading to evidence-based recommendations for practitioners working with this complex data.

Data-driven analysis and detection of abusive online content covers many different tasks, phenomena, contexts, and methodologies. This paper systematically reviews abusive language dataset creation and content in conjunction with an open website for cataloguing abusive language data. This collection of knowledge leads to a synthesis providing evidence-based recommendations for practitioners working with this complex and highly diverse data.

View on arXiv PDF

Similar