Predicting Different Types of Subtle Toxicity in Unhealthy Online Conversations
It addresses the challenge of identifying subtler forms of abuse like hostility and sarcasm in online platforms, which is incremental as it applies existing methods to a specific domain.
This paper tackled the problem of classifying subtle toxicity in online conversations, achieving a top micro F1-score of 88.76% and macro F1-score of 67.98% on a dataset of 44K comments, with hostile comments being easier to detect than other types.
This paper investigates the use of machine learning models for the classification of unhealthy online conversations containing one or more forms of subtler abuse, such as hostility, sarcasm, and generalization. We leveraged a public dataset of 44K online comments containing healthy and unhealthy comments labeled with seven forms of subtle toxicity. We were able to distinguish between these comments with a top micro F1-score, macro F1-score, and ROC-AUC of 88.76%, 67.98%, and 0.71, respectively. Hostile comments were easier to detect than other types of unhealthy comments. We also conducted a sentiment analysis which revealed that most types of unhealthy comments were associated with a slight negative sentiment, with hostile comments being the most negative ones.