CLSep 28, 2021

One to rule them all: Towards Joint Indic Language Hate Speech Detection

arXiv:2109.13711v121 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of detecting hate speech in minority languages for social media moderation, but it is incremental as it applies existing multilingual methods to a specific shared task.

The paper tackled hate speech detection across English, Hindi, and Marathi by developing a multilingual transformer model, achieving Macro F1 scores up to 0.8651 for binary classification and 0.6268 for fine-grained classification.

This paper is a contribution to the Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) 2021 shared task. Social media today is a hotbed of toxic and hateful conversations, in various languages. Recent news reports have shown that current models struggle to automatically identify hate posted in minority languages. Therefore, efficiently curbing hate speech is a critical challenge and problem of interest. We present a multilingual architecture using state-of-the-art transformer language models to jointly learn hate and offensive speech detection across three languages namely, English, Hindi, and Marathi. On the provided testing corpora, we achieve Macro F1 scores of 0.7996, 0.7748, 0.8651 for sub-task 1A and 0.6268, 0.5603 during the fine-grained classification of sub-task 1B. These results show the efficacy of exploiting a multilingual training scheme.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes