SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)
This work addresses the need for automated detection of offensive content in social media, which is an incremental contribution as it builds on existing efforts in natural language processing.
The paper tackled the problem of identifying and categorizing offensive language in social media by introducing a new dataset (OLID) with over 14,000 tweets and organizing a competition (OffensEval) with three sub-tasks, attracting about 800 teams and 115 submissions.
We present the results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval). The task was based on a new dataset, the Offensive Language Identification Dataset (OLID), which contains over 14,000 English tweets. It featured three sub-tasks. In sub-task A, the goal was to discriminate between offensive and non-offensive posts. In sub-task B, the focus was on the type of offensive content in the post. Finally, in sub-task C, systems had to detect the target of the offensive posts. OffensEval attracted a large number of participants and it was one of the most popular tasks in SemEval-2019. In total, about 800 teams signed up to participate in the task, and 115 of them submitted results, which we present and analyze in this report.