Analysing Cyberbullying using Natural Language Processing by Understanding Jargon in Social Media
This work addresses cyberbullying detection for minors on social media, but it is incremental as it builds on existing NLP methods with a new preprocessing step.
The paper tackled cyberbullying detection by using a slang-abusive corpus for preprocessing across multiple social media datasets, achieving higher precision compared to models without this technique.
Cyberbullying is of extreme prevalence today. Online-hate comments, toxicity, cyberbullying amongst children and other vulnerable groups are only growing over online classes, and increased access to social platforms, especially post COVID-19. It is paramount to detect and ensure minors' safety across social platforms so that any violence or hate-crime is automatically detected and strict action is taken against it. In our work, we explore binary classification by using a combination of datasets from various social media platforms that cover a wide range of cyberbullying such as sexism, racism, abusive, and hate-speech. We experiment through multiple models such as Bi-LSTM, GloVe, state-of-the-art models like BERT, and apply a unique preprocessing technique by introducing a slang-abusive corpus, achieving a higher precision in comparison to models without slang preprocessing.