SI CLSep 27, 2019

HateMonitors: Language Agnostic Abuse Detection in Social Media

Punyajoy Saha, Binny Mathew, Pawan Goyal, Animesh Mukherjee

arXiv:1909.12642v17.331 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of moderating abusive content in online platforms, particularly for Indo-European languages, but it is incremental as it builds on existing methods like BERT and Gradient Boosting.

The authors tackled the problem of detecting hate speech and offensive content in social media by developing HateMonitor, a language-agnostic machine learning model using Gradient Boosting with BERT and LASER embeddings, which achieved first place in the German sub-task of the HASOC shared task at FIRE 2019.

Reducing hateful and offensive content in online social media pose a dual problem for the moderators. On the one hand, rigid censorship on social media cannot be imposed. On the other, the free flow of such content cannot be allowed. Hence, we require efficient abusive language detection system to detect such harmful content in social media. In this paper, we present our machine learning model, HateMonitor, developed for Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC), a shared task at FIRE 2019. We have used a Gradient Boosting model, along with BERT and LASER embeddings, to make the system language agnostic. Our model came at First position for the German sub-task A. We have also made our model public at https://github.com/punyajoy/HateMonitors-HASOC .

View on arXiv PDF Code

Similar