CLFeb 14, 2021

indicnlp@kgp at DravidianLangTech-EACL2021: Offensive Language Identification in Dravidian Languages

arXiv:2102.07150v1801 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of detecting offensive content in low-resource Dravidian languages for NLP applications, but it is incremental as it applies existing methods to new datasets.

The paper tackled offensive language identification in three code-mixed Dravidian languages by using an ensemble of AWD-LSTM, BERT, and RoBERTa models, achieving weighted-average F1 scores of 0.97, 0.77, and 0.72 on Malayalam-English, Tamil-English, and Kannada-English datasets, ranking 1st, 2nd, and 3rd respectively.

The paper presents the submission of the team indicnlp@kgp to the EACL 2021 shared task "Offensive Language Identification in Dravidian Languages." The task aimed to classify different offensive content types in 3 code-mixed Dravidian language datasets. The work leverages existing state of the art approaches in text classification by incorporating additional data and transfer learning on pre-trained models. Our final submission is an ensemble of an AWD-LSTM based model along with 2 different transformer model architectures based on BERT and RoBERTa. We achieved weighted-average F1 scores of 0.97, 0.77, and 0.72 in the Malayalam-English, Tamil-English, and Kannada-English datasets ranking 1st, 2nd, and 3rd on the respective tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes