CLJul 29, 2021

IIITG-ADBU@HASOC-Dravidian-CodeMix-FIRE2020: Offensive Content Detection in Code-Mixed Dravidian Text

Arup Baruah, Kaushik Amar Das, Ferdous Ahmed Barbhuiya, Kuntal Dey

arXiv:2107.14336v11.013 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of detecting offensive content in code-mixed Dravidian languages for social media moderation, but it is incremental as it applies existing methods to a new dataset.

The paper tackled offensive content detection in code-mixed Dravidian text, achieving a weighted F1 score of 0.95 (1st rank) on YouTube Malayalam data with an SVM classifier and 0.87 (3rd rank) on Tamil Twitter data with an XLM-RoBERTa classifier.

This paper presents the results obtained by our SVM and XLM-RoBERTa based classifiers in the shared task Dravidian-CodeMix-HASOC 2020. The SVM classifier trained using TF-IDF features of character and word n-grams performed the best on the code-mixed Malayalam text. It obtained a weighted F1 score of 0.95 (1st Rank) and 0.76 (3rd Rank) on the YouTube and Twitter dataset respectively. The XLM-RoBERTa based classifier performed the best on the code-mixed Tamil text. It obtained a weighted F1 score of 0.87 (3rd Rank) on the code-mixed Tamil Twitter dataset.

View on arXiv PDF

Similar