CLFeb 1, 2021

SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification

arXiv:2102.01051v232.8806 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of detecting offensive content in low-resource Dravidian languages, which is incremental as it applies existing multilingual models with task-specific adaptations.

The paper tackled offensive language identification in Dravidian languages by using an ensemble of mBERT and XLM-RoBERTa models with task-adaptive pre-training, achieving 1st place for Kannada, 2nd for Malayalam, and 3rd for Tamil in the EACL 2021 shared task.

In this paper we present our submission for the EACL 2021-Shared Task on Offensive Language Identification in Dravidian languages. Our final system is an ensemble of mBERT and XLM-RoBERTa models which leverage task-adaptive pre-training of multilingual BERT models with a masked language modeling objective. Our system was ranked 1st for Kannada, 2nd for Malayalam and 3rd for Tamil.

View on arXiv PDF Code

Similar