CLApr 23, 2020

UHH-LT at SemEval-2020 Task 12: Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection

arXiv:2004.11493v21009 citations
AI Analysis

This work addresses the problem of detecting offensive language online, which is important for content moderation, but it is incremental as it applies existing fine-tuning methods to a specific task.

The paper tackled offensive language detection by fine-tuning pre-trained transformer networks, achieving first place in SemEval 2020 Task 12 with a RoBERTa-based classifier and further improving results with ALBERT.

Fine-tuning of pre-trained transformer networks such as BERT yield state-of-the-art results for text classification tasks. Typically, fine-tuning is performed on task-specific training datasets in a supervised manner. One can also fine-tune in unsupervised manner beforehand by further pre-training the masked language modeling (MLM) task. Hereby, in-domain data for unsupervised MLM resembling the actual classification target dataset allows for domain adaptation of the model. In this paper, we compare current pre-trained transformer networks with and without MLM fine-tuning on their performance for offensive language detection. Our MLM fine-tuned RoBERTa-based classifier officially ranks 1st in the SemEval 2020 Shared Task~12 for the English language. Further experiments with the ALBERT model even surpass this result.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes