CLJul 29, 2021

IIITG-ADBU@HASOC-Dravidian-CodeMix-FIRE2020: Offensive Content Detection in Code-Mixed Dravidian Text

arXiv:2107.14336v113 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of detecting offensive content in code-mixed Dravidian languages for social media moderation, but it is incremental as it applies existing methods to a new dataset.

The paper tackled offensive content detection in code-mixed Dravidian text, achieving a weighted F1 score of 0.95 (1st rank) on YouTube Malayalam data with an SVM classifier and 0.87 (3rd rank) on Tamil Twitter data with an XLM-RoBERTa classifier.

This paper presents the results obtained by our SVM and XLM-RoBERTa based classifiers in the shared task Dravidian-CodeMix-HASOC 2020. The SVM classifier trained using TF-IDF features of character and word n-grams performed the best on the code-mixed Malayalam text. It obtained a weighted F1 score of 0.95 (1st Rank) and 0.76 (3rd Rank) on the YouTube and Twitter dataset respectively. The XLM-RoBERTa based classifier performed the best on the code-mixed Tamil text. It obtained a weighted F1 score of 0.87 (3rd Rank) on the code-mixed Tamil Twitter dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes