CLFeb 1, 2021

SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification

arXiv:2102.01051v2806 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of detecting offensive content in low-resource Dravidian languages, which is incremental as it applies existing multilingual models with task-specific adaptations.

The paper tackled offensive language identification in Dravidian languages by using an ensemble of mBERT and XLM-RoBERTa models with task-adaptive pre-training, achieving 1st place for Kannada, 2nd for Malayalam, and 3rd for Tamil in the EACL 2021 shared task.

In this paper we present our submission for the EACL 2021-Shared Task on Offensive Language Identification in Dravidian languages. Our final system is an ensemble of mBERT and XLM-RoBERTa models which leverage task-adaptive pre-training of multilingual BERT models with a masked language modeling objective. Our system was ranked 1st for Kannada, 2nd for Malayalam and 3rd for Tamil.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes