CLFeb 25, 2019

Predicting the Type and Target of Offensive Posts in Social Media

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

arXiv:1902.09666v233.41246 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for a more comprehensive approach to detecting offensive content in social media, though it is incremental as it builds on existing datasets and methods.

The paper tackles the problem of identifying offensive content in social media by modeling it hierarchically to predict both the type and target of offensive messages, resulting in the creation of the OLID dataset and experiments with machine learning models.

As offensive content has become pervasive in social media, there has been much research in identifying potentially offensive messages. However, previous work on this topic did not consider the problem as a whole, but rather focused on detecting very specific types of offensive content, e.g., hate speech, cyberbulling, or cyber-aggression. In contrast, here we target several different kinds of offensive content. In particular, we model the task hierarchically, identifying the type and the target of offensive messages in social media. For this purpose, we complied the Offensive Language Identification Dataset (OLID), a new dataset with tweets annotated for offensive content using a fine-grained three-layer annotation scheme, which we make publicly available. We discuss the main similarities and differences between OLID and pre-existing datasets for hate speech identification, aggression detection, and similar tasks. We further experiment with and we compare the performance of different machine learning models on OLID.

View on arXiv PDF

Similar