CLOct 5, 2020

PUM at SemEval-2020 Task 12: Aggregation of Transformer-based models' features for offensive language recognition

Piotr Janiszewski, Mateusz Skiba, Urszula Walińska

arXiv:2010.01897v1991 citations

Originality Synthesis-oriented

AI Analysis

This provides a tool for identifying offensive language in text, but it is incremental as it combines existing Transformer models without major innovation.

The paper tackled offensive language recognition by aggregating features from fine-tuned BERT and XLNet models, achieving 64.727% macro F1-score for target identification (ranked 7th out of 40) and 89.726% F1-score for language identification (ranked 64th out of 85).

In this paper, we describe the PUM team's entry to the SemEval-2020 Task 12. Creating our solution involved leveraging two well-known pretrained models used in natural language processing: BERT and XLNet, which achieve state-of-the-art results in multiple NLP tasks. The models were fine-tuned for each subtask separately and features taken from their hidden layers were combined and fed into a fully connected neural network. The model using aggregated Transformer features can serve as a powerful tool for offensive language identification problem. Our team was ranked 7th out of 40 in Sub-task C - Offense target identification with 64.727% macro F1-score and 64th out of 85 in Sub-task A - Offensive language identification (89.726% F1-score).

View on arXiv PDF

Similar