CLAIFeb 19, 2021

Hate-Alert@DravidianLangTech-EACL2021: Ensembling strategies for Transformer-based Offensive language Detection

arXiv:2102.10084v1801 citations
Originality Incremental advance
AI Analysis

This addresses the problem of detecting offensive content on social media for low-resource languages like Tamil, Kannada, and Malayalam, though it is incremental as it builds on existing transformer and ensembling methods.

The paper tackled offensive language detection in low-resource Dravidian languages by exploring transformer models and a genetic algorithm for ensembling, achieving first place in Tamil and Malayalam and second in Kannada sub-tasks.

Social media often acts as breeding grounds for different forms of offensive content. For low resource languages like Tamil, the situation is more complex due to the poor performance of multilingual or language-specific models and lack of proper benchmark datasets. Based on this shared task, Offensive Language Identification in Dravidian Languages at EACL 2021, we present an exhaustive exploration of different transformer models, We also provide a genetic algorithm technique for ensembling different models. Our ensembled models trained separately for each language secured the first position in Tamil, the second position in Kannada, and the first position in Malayalam sub-tasks. The models and codes are provided.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes