LG CL CY SIDec 21, 2024

Identifying Cyberbullying Roles in Social Media

Manuel Sandoval, Mohammed Abuhamad, Patrick Furman, Mujtaba Nazari, Deborah L. Hall, Yasin N. Silva

arXiv:2412.16417v14.61 citationsh-index: 2ASONAM

Originality Synthesis-oriented

AI Analysis

This work addresses cyberbullying detection for children and adolescents on social media, but it is incremental as it builds on existing methods with new data and optimizations.

The study tackled the problem of detecting cyberbullying roles in social media interactions using machine learning models, achieving an overall F1 score of 83.5% with a fine-tuned RoBERTa model, which improved to 89.3% after applying a prediction threshold.

Social media has revolutionized communication, allowing people worldwide to connect and interact instantly. However, it has also led to increases in cyberbullying, which poses a significant threat to children and adolescents globally, affecting their mental health and well-being. It is critical to accurately detect the roles of individuals involved in cyberbullying incidents to effectively address the issue on a large scale. This study explores the use of machine learning models to detect the roles involved in cyberbullying interactions. After examining the AMiCA dataset and addressing class imbalance issues, we evaluate the performance of various models built with four underlying LLMs (i.e., BERT, RoBERTa, T5, and GPT-2) for role detection. Our analysis shows that oversampling techniques help improve model performance. The best model, a fine-tuned RoBERTa using oversampled data, achieved an overall F1 score of 83.5%, increasing to 89.3% after applying a prediction threshold. The top-2 F1 score without thresholding was 95.7%. Our method outperforms previously proposed models. After investigating the per-class model performance and confidence scores, we show that the models perform well in classes with more samples and less contextual confusion (e.g., Bystander Other), but struggle with classes with fewer samples (e.g., Bystander Assistant) and more contextual ambiguity (e.g., Harasser and Victim). This work highlights current strengths and limitations in the development of accurate models with limited data and complex scenarios.

View on arXiv PDF

Similar