SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification
This work addresses the problem of detecting offensive content in low-resource Dravidian languages, which is incremental as it applies existing multilingual models with task-specific adaptations.
The paper tackled offensive language identification in Dravidian languages by using an ensemble of mBERT and XLM-RoBERTa models with task-adaptive pre-training, achieving 1st place for Kannada, 2nd for Malayalam, and 3rd for Tamil in the EACL 2021 shared task.
In this paper we present our submission for the EACL 2021-Shared Task on Offensive Language Identification in Dravidian languages. Our final system is an ensemble of mBERT and XLM-RoBERTa models which leverage task-adaptive pre-training of multilingual BERT models with a masked language modeling objective. Our system was ranked 1st for Kannada, 2nd for Malayalam and 3rd for Tamil.