CLNov 27, 2021

Exploring Transformer Based Models to Identify Hate Speech and Offensive Content in English and Indo-Aryan Languages

Somnath Banerjee, Maulindu Sarkar, Nancy Agrawal, Punyajoy Saha, Mithun Das

arXiv:2111.13974v12.243 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the issue of hate speech on online social media, which can harm user health, but it is incremental as it applies existing Transformer models to new language datasets.

The paper tackled the problem of detecting hate speech and offensive content in English and Indo-Aryan languages using Transformer-based models, achieving competitive results such as 2nd place in a code-mixed dataset with a Macro F1 of 0.7107 and 4th in an English four-class category with a Macro F1 of 0.8006.

Hate speech is considered to be one of the major issues currently plaguing online social media. Repeated and repetitive exposure to hate speech has been shown to create physiological effects on the target users. Thus, hate speech, in all its forms, should be addressed on these platforms in order to maintain good health. In this paper, we explored several Transformer based machine learning models for the detection of hate speech and offensive content in English and Indo-Aryan languages at FIRE 2021. We explore several models such as mBERT, XLMR-large, XLMR-base by team name "Super Mario". Our models came 2nd position in Code-Mixed Data set (Macro F1: 0.7107), 2nd position in Hindi two-class classification(Macro F1: 0.7797), 4th in English four-class category (Macro F1: 0.8006) and 12th in English two-class category (Macro F1: 0.6447).

View on arXiv PDF

Similar