CLApr 3, 2020

Directions in Abusive Language Training Data: Garbage In, Garbage Out

arXiv:2004.01670v3346 citations
AI Analysis

It addresses the problem of inconsistent and low-quality training data for abusive language detection, which affects researchers and practitioners in NLP and online safety, but is incremental as it synthesizes existing knowledge.

The paper systematically reviews abusive language dataset creation and content, leading to evidence-based recommendations for practitioners working with this complex data.

Data-driven analysis and detection of abusive online content covers many different tasks, phenomena, contexts, and methodologies. This paper systematically reviews abusive language dataset creation and content in conjunction with an open website for cataloguing abusive language data. This collection of knowledge leads to a synthesis providing evidence-based recommendations for practitioners working with this complex and highly diverse data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes