CLAug 3, 2020

LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific BERT?

Marc Pàmies, Emily Öhman, Kaisla Kajava, Jörg Tiedemann

arXiv:2008.00805v131.11001 citations

Originality Synthesis-oriented

AI Analysis

This work addresses offensive content detection on social media, but it is incremental as it applies an existing method to a specific task.

The paper tackled offensive language and target identification in tweets by fine-tuning BERT on OLID and SOLID datasets, achieving state-of-the-art results in SemEval-2020 Task 12.

This paper presents the different models submitted by the LT@Helsinki team for the SemEval 2020 Shared Task 12. Our team participated in sub-tasks A and C; titled offensive language identification and offense target identification, respectively. In both cases we used the so-called Bidirectional Encoder Representation from Transformer (BERT), a model pre-trained by Google and fine-tuned by us on the OLID and SOLID datasets. The results show that offensive tweet classification is one of several language-based tasks where BERT can achieve state-of-the-art results.

View on arXiv PDF

Similar