Defining, Understanding, and Detecting Online Toxicity: Challenges and Machine Learning Approaches
This work addresses the pervasive problem of online toxicity for digital platforms and researchers, but it is incremental as it reviews and synthesizes existing literature rather than introducing new methods.
The study synthesized 140 publications to provide a comprehensive overview of datasets, definitions, and machine learning approaches for detecting online toxic content like hate speech and offensive language across 32 languages, examining cross-platform data to improve classification models and offering practical guidelines for mitigation.
Online toxic content has grown into a pervasive phenomenon, intensifying during times of crisis, elections, and social unrest. A significant amount of research has been focused on detecting or analyzing toxic content using machine-learning approaches. The proliferation of toxic content across digital platforms has spurred extensive research into automated detection mechanisms, primarily driven by advances in machine learning and natural language processing. Overall, the present study represents the synthesis of 140 publications on different types of toxic content on digital platforms. We present a comprehensive overview of the datasets used in previous studies focusing on definitions, data sources, challenges, and machine learning approaches employed in detecting online toxicity, such as hate speech, offensive language, and harmful discourse. The dataset encompasses content in 32 languages, covering topics such as elections, spontaneous events, and crises. We examine the possibility of using existing cross-platform data to improve the performance of classification models. We present the recommendations and guidelines for new research on online toxic consent and the use of content moderation for mitigation. Finally, we present some practical guidelines to mitigate toxic content from online platforms.